A better suggestion
I think a better vectorized approach would be with slicing -
(series[slen:2*slen] - series[:slen]).sum()/float(slen**2)
Runtime test and verification -
In [139]: series = np.random.randint(11,999,(200))
...: slen= 66
...:
# Original app
In [140]: %timeit get_trend(series, slen)
100000 loops, best of 3: 17.1 µs per loop
# Proposed app
In [141]: %timeit (series[slen:2*slen] - series[:slen]).sum()/float(slen**2)
100000 loops, best of 3: 3.81 µs per loop
In [142]: out1 = get_trend(series, slen)
In [143]: out2 = (series[slen:2*slen] - series[:slen]).sum()/float(slen**2)
In [144]: out1, out2
Out[144]: (0.7587235996326905, 0.75872359963269054)
Investigating comparison on average based approach against loopy one
Let's add the second approach (vectorized one) from the question for testing -
In [146]: np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
Out[146]: 0.75872359963269054
Timings are better than the loopy one and results look good. So, I am suspecting the way you are timing might be off.
If you are using NumPy ufuncs to leverage the vectorized operations with NumPy, you should work with arrays. So, if your data is a list, convert it to an array and then use the vectorized approach. Let's investigate it a bit more -
Case #1 : With a list of 200 elems and slen = 66
In [147]: series_list = np.random.randint(11,999,(200)).tolist()
In [148]: series = np.asarray(series_list)
In [149]: slen = 66
In [150]: %timeit get_trend(series_list, slen)
100000 loops, best of 3: 5.68 µs per loop
In [151]: %timeit np.asarray(series_list)
100000 loops, best of 3: 7.99 µs per loop
In [152]: %timeit np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
100000 loops, best of 3: 6.98 µs per loop
Case #2 : Scale it 10x
In [157]: series_list = np.random.randint(11,999,(2000)).tolist()
In [159]: series = np.asarray(series_list)
In [160]: slen = 660
In [161]: %timeit get_trend(series_list, slen)
10000 loops, best of 3: 53.6 µs per loop
In [162]: %timeit np.asarray(series_list)
10000 loops, best of 3: 65.4 µs per loop
In [163]: %timeit np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
100000 loops, best of 3: 8.71 µs per loop
So, it's the overhead of converting to an array that's hurting you!
Investigating comparison on sum based approach against average based one
On the third part of comparing sum-based code against average-based one, it's because np.avarege is indeed slower than "manually" doing it with summation. Timing it on this as well -
In [173]: a = np.random.randint(0,1000,(1000))
In [174]: %timeit np.sum(a)/float(len(a))
100000 loops, best of 3: 4.36 µs per loop
In [175]: %timeit np.average(a)
100000 loops, best of 3: 7.2 µs per loop
A better one than np.average with np.mean -
In [179]: %timeit np.mean(a)
100000 loops, best of 3: 6.46 µs per loop
Now, looking into the source code for np.average, it seems to be using np.mean. This explains why it 's slower than np.mean as we are avoiding the function call overhead there. On the tussle between np.sum and np.mean, I think np.mean does take care of the overflow in case we are adding a huge number of elements, which we might miss it with np.sum. So, for being on the safe side, I guess it's better to go with np.mean.