optimizing python performance in function calls

Question

I have the following example code which resembles the main code I am working on. The main bottleneck I see is in the function calls call_fun. Is there a way to speed this up ? ..example: not using a dictionary object self._d but something else for function lookup? In the main code, the list "names" is pretty big. You can enable the commented out print statements for a quick understanding of the code (... but please be sure to change i in range(500000) to i in range(1) if you want to print output)

import time

names = [ ('f_a', ([1,1],)), ('f_b', ([3,4],) ) ]

class A(object):
    def __init__(self):        
        self._d = {}
        for n in names:            
            self._d[n[0]] = getattr(self, n[0])

    def call_fun(self, k):       
        #print " In call_fun: k: ", k
        return self._d[k[0]](*k[1])

    def f_a(self, vals):
        #print " I am here in f_a.. vals=", vals
        v =  2*vals
        return v

    def f_b(self, vals):
        v =  3*vals
        return v


# Run the code

start = time.clock()
a = A()
print "names[0]:", names[0]
for i in range(5000000):
    a.call_fun((names[0]))
print "done, elapsed wall clock time (win32) in seconds: " , time.clock() - start

Here is the profiling output: python -m cProfile --sort cumulative foo.py

    10000009 function calls in 5.614 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    2.066    2.066    5.614    5.614 foo.py:1(<module>)
  5000000    2.345    0.000    3.412    0.000 foo.py:11(call_fun)
  5000000    1.067    0.000    1.067    0.000 foo.py:15(f_a)
        1    0.135    0.135    0.135    0.135 {range}
        1    0.000    0.000    0.000    0.000 foo.py:6(__init__)
        2    0.000    0.000    0.000    0.000 {time.clock}
        1    0.000    0.000    0.000    0.000 foo.py:5(A)
        2    0.000    0.000    0.000    0.000 {getattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

What are you actually trying to do? Microoptimizing your function calls is probably not the best way to gain efficiency, but you seem to have stripped out anything that would indicate the actual problem you're trying to solve. — user2357112, Sep 11 '13 at 09:06

Daniel Hepper · Accepted Answer · 2013-09-11T09:15:24.190

I don't think there is much room for improvement. After all, you are doing 5 million function calls in about 5 seconds, that is 1µs (not 1ns) or about 2000 CPU cycles on a 2 GHz CPU per function call.

You best bet is probably PyPy if you can live with its limitations.

$ python -V
Python 2.7.1 
$ python so18736473.py
names[0]: ('f_a', ([1, 1],))
done, elapsed wall clock time (win32) in seconds:  5.418259
$ pypy -V
Python 2.7.2 (341e1e3821fff77db3bb5cdb7a4851626298c44e, Jun 09 2012, 14:24:11)
[PyPy 1.9.0]
$ pypy so18736473.py
names[0]: ('f_a', ([1, 1],))
done, elapsed wall clock time (win32) in seconds:  0.648846

5 million function calls in 5 seconds is a microsecond per call, not a nanosecond. — user2357112, Sep 11 '13 at 09:08

score 1 · Answer 2 · answered Sep 11 '13 at 08:59

Python probably won't do anything 5 million times quickly... See this distilled example of your code which gets rid of the dictionary entirely and hardcodes the function (but same number of nested calls):

import time

class A(object):
    def __init__(self):
        pass

    def call_fun(self, k):       
        return self.f_a([1,1])

    def f_a(self, vals):
        v =  2*vals
        return v

start = time.clock()
a = A()
for i in range(5000000):
    a.call_fun([1,1])
print "done, elapsed wall clock time (win32) in seconds: " , time.clock() - start

It profiles essentially the same, maybe very slightly faster. The overhead is mostly in your function calls.

You can probably get a ~10% speed boost by moving them out of a class and to module level:

import time

def call_fun(k):       
    return f_a([1,1])

def f_a(vals):
    v =  2*vals
    return v

start = time.clock()
for i in range(5000000):
    call_fun([1,1])
print "done, elapsed wall clock time (win32) in seconds: " , time.clock() - start

This typical answer in cases like this is "What are you really trying to accomplish?"

score 0 · Answer 3 · answered Sep 11 '13 at 09:30

You can get slightly better performance by eliminating the dictionary look-up that maps the method name to the method. This is done below by creating anames2list. Likewise, you could go a little further and storenames2[0] since it doesn't change in theforloop.

None of this get rid of the fact that you're calling the function indirectly by passing it to another function that basically just calls it for you with the canned argument list. It's not obvious what the reason for that is from your example code.

import time

names = [ ('f_a', ([1,1],)), ('f_b', ([3,4],) ) ]

class A(object):
    def __init__(self):
        pass

    def call_fun(self, k):
        #print " In call_fun: k: ", k
        return k[0](*k[1])

    def f_a(self, vals):
        #print " I am here in f_a.. vals=", vals
        v =  2*vals
        return v

    def f_b(self, vals):
        v =  3*vals
        return v

# Run the code

start = time.clock()
a = A()
print "names[0]:", names[0]
names2 = [(getattr(a, name[0]), name[1]) for name in names]
func = names2[0]
for i in range(5000000):
    a.call_fun(func)
print "done, elapsed wall clock time (win32) in seconds: " , time.clock() - start

score 0 · Answer 4 · edited May 23 '17 at 12:22

This is what happens when you don't have visibility at specific lines, only at functions.

It says the module uses 5.614 seconds, and the calls to call_fun use 3.412 seconds. (682 nsec/call.) That, together with the 0.135 seconds in range, leaves 2.067 seconds in the module unaccounted for, or 37%.

The 3.412 seconds in call_fun includes a call to f_a (through k), using 1.067 seconds, leaving 2.345 seconds unaccounted for, or 42% of the total.

So, altogether, 79% of the time is unexplained, and you're left either guessing what it is or concluding that nothing can be done. There's a better way to find out where you should look.

optimizing python performance in function calls

4 Answers4