for loop is faster than numpy average, also a bit different result

Question

I'm calculating the average slope between a part of a series and a lagged part of the series:

one way is:

def get_trend(series, slen)
    sum = 0.0
    for i in range(slen):
        sum += series[i+slen] - series[i]
    return sum / slen**2

second way is:

numpy.average(numpy.subtract(series[slen:2*slen], series[:slen]))/float(slen)

The first code is faster than the second by about 50% according to timeit, and the results differ in the 18th digit and onward for a series of size 200 with slen = 66 and numbers in the series ranging between 0 and 1.

I've also tried to replace the average with sum and divide by slen**2, like I do in the for sum:

numpy.sum(numpy.subtract(series[slen:2*slen], series[:slen]))/float(slen**2)

This is equivalent in execution time to the for loop version, but the result is still not exactly the same, it's also (sometime) not the same as the average version, although more often than not it is the same as the average version.

Questions are: Which of these should give the most accurate answer? Why does the last version give a different answer from the for loop? And why is the average function so inefficient?

Note: for timing I'm measuring the operation on a standard list, on a numpy array the average is faster than the for loop, but the sum is still more than twice as fast as the average.

`numpy.subtract(a, b)` can be more succinctly written `a - b` — Eric, Mar 07 '17 at 11:41
@Eric numpy.subtract would work on standard lists, while a - b would only work when at least one of the operands is a numpy array. — Mr.WorshipMe, Mar 07 '17 at 12:22

Divakar · Accepted Answer · 2017-03-07T11:37:41.700

A better suggestion

I think a better vectorized approach would be with slicing -

(series[slen:2*slen] - series[:slen]).sum()/float(slen**2)

Runtime test and verification -

In [139]: series = np.random.randint(11,999,(200))
     ...: slen= 66
     ...: 

# Original app
In [140]: %timeit get_trend(series, slen) 
100000 loops, best of 3: 17.1 µs per loop

# Proposed app
In [141]: %timeit (series[slen:2*slen] - series[:slen]).sum()/float(slen**2)
100000 loops, best of 3: 3.81 µs per loop

In [142]: out1 = get_trend(series, slen)

In [143]: out2 = (series[slen:2*slen] - series[:slen]).sum()/float(slen**2)

In [144]: out1, out2
Out[144]: (0.7587235996326905, 0.75872359963269054)

Investigating comparison on average based approach against loopy one

Let's add the second approach (vectorized one) from the question for testing -

In [146]: np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
Out[146]: 0.75872359963269054

Timings are better than the loopy one and results look good. So, I am suspecting the way you are timing might be off.

If you are using NumPy ufuncs to leverage the vectorized operations with NumPy, you should work with arrays. So, if your data is a list, convert it to an array and then use the vectorized approach. Let's investigate it a bit more -

Case #1 : With a list of 200 elems and slen = 66

In [147]: series_list = np.random.randint(11,999,(200)).tolist()

In [148]: series = np.asarray(series_list)

In [149]: slen = 66

In [150]: %timeit get_trend(series_list, slen)
100000 loops, best of 3: 5.68 µs per loop

In [151]: %timeit np.asarray(series_list)
100000 loops, best of 3: 7.99 µs per loop

In [152]: %timeit np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
100000 loops, best of 3: 6.98 µs per loop

Case #2 : Scale it 10x

In [157]: series_list = np.random.randint(11,999,(2000)).tolist()

In [159]: series = np.asarray(series_list)

In [160]: slen = 660

In [161]: %timeit get_trend(series_list, slen)
10000 loops, best of 3: 53.6 µs per loop

In [162]: %timeit np.asarray(series_list)
10000 loops, best of 3: 65.4 µs per loop

In [163]: %timeit np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
100000 loops, best of 3: 8.71 µs per loop

So, it's the overhead of converting to an array that's hurting you!

Investigating comparison on sum based approach against average based one

On the third part of comparing sum-based code against average-based one, it's because np.avarege is indeed slower than "manually" doing it with summation. Timing it on this as well -

In [173]: a = np.random.randint(0,1000,(1000))

In [174]: %timeit np.sum(a)/float(len(a))
100000 loops, best of 3: 4.36 µs per loop

In [175]: %timeit np.average(a)
100000 loops, best of 3: 7.2 µs per loop

A better one than np.average with np.mean -

In [179]: %timeit np.mean(a)
100000 loops, best of 3: 6.46 µs per loop

Now, looking into the source code for np.average, it seems to be using np.mean. This explains why it 's slower than np.mean as we are avoiding the function call overhead there. On the tussle between np.sum and np.mean, I think np.mean does take care of the overflow in case we are adding a huge number of elements, which we might miss it with np.sum. So, for being on the safe side, I guess it's better to go with np.mean.

This is no answer to his question? he aks why it happens? not what would be a better way...? — Gijs Den Hollander, Mar 07 '17 at 11:00
@GijsDenHollander Well `it` doesn't happen because the vectorized one (second one posted by OP) seems to be better than the loopy one as the timings suggest, added at the end. So, as I pointed out, OP might not be timing it properly. — Divakar, Mar 07 '17 at 11:02
(series[slen:2*slen] - series[:slen]).sum()/float(slen** 2) is very similar in timing to numpy.sum(numpy.subtract(series[slen:2*slen], series[:slen]))/slen** 2, but the second way will work with standard lists as well as numpy arrays, while the first version only works with numpy arrays. I think the difference in our timings stem from the fact that I measure timing on a standard list, while you time it on a numpy array. — Mr.WorshipMe, Mar 07 '17 at 11:06
@Mr.WorshipMe That might be the issue! Think you should add that important bit in the question. I assumed you were working with NumPy arrays, as the question was tagged with NumPy. — Divakar, Mar 07 '17 at 11:08
@Divakar So a better answer would be the conversion from an array to numpy array, is causing the time difference? — Gijs Den Hollander, Mar 07 '17 at 11:11
@GijsDenHollander This would still not explain why the sum version is twice as fast as the average version, though. — Mr.WorshipMe, Mar 07 '17 at 11:13
@Mr.WorshipMe Added a section at the end on that comparison on sum against average. — Divakar, Mar 07 '17 at 11:22
@Divakar So i have just checked the source code . And in (numpy as np) np.average you need to convert the array to an np.array. In np.sum this not required, if its a numpy array is sums the elements. If its a normal array is uses the 'sum' function. Cheers — Gijs Den Hollander, Mar 07 '17 at 11:25
@GijsDenHollander Awesome, thanks for the update! OP should take a note on it. — Divakar, Mar 07 '17 at 11:27
@WorshipMe see my second last comment:) And this might be an interresting post for you http://meta.stackexchange.com/questions/85818/how-can-i-change-my-name-on-a-stack-exchange-site. ;) — Gijs Den Hollander, Mar 07 '17 at 12:16
If `ll` is a long list, `np.sum(ll)` times the same as `np.sum(np.array(ll))`. `sum` is compiled, but it first converts its input to array. Python `sum(ll)` is quite a bit faster, since it works with the list directly. If you take the array conversion out of the timing `np.sum(arr)` is still faster. — hpaulj, Mar 07 '17 at 17:20

score 0 · Answer 2 · edited May 23 '17 at 11:46

0

As for the first two questions...well I would not say you have different results! A difference of the order of 1e-18 is veeeery small (also think in the context of your script). This is why you should not compare floats for strict equality, but set a tolerance:

What is the best way to compare floats for almost-equality in Python? https://www.python.org/dev/peps/pep-0485/

edited May 23 '17 at 11:46

Community

1
1

answered Mar 07 '17 at 11:02

FLab

7,136
5
36
69

for loop is faster than numpy average, also a bit different result

2 Answers2