The snippet comes from the book Python Cookbook. There are three files.
sample.pyx
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef clip(double[:] a, double min, double max, double[:] out):
    if min > max:
        raise ValueError('min must be <= max')
    if a.shape[0] != out.shape[0]:
        raise ValueError('input and output arrays must be the same size!')
    for i in range(a.shape[0]):
        if a[i] < min:
            out[i] = min
        elif a[i] > max:
            out[i] = max
        else:
            out[i] = a[i]
setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("sample.pyx"))
and main.py as test file
b = np.random.uniform(-10, 10, size=1000000)
a = np.zeros_like(b)
since = time.time()
np.clip(b, -5, 5, a)
print(time.time() - since)
since = time.time()
sample.clip(b, -5, 5, a)
print(time.time() - since)
Surprisingly, the Numpy runs 2x faster than Cython code, while the book claims the opposite. The performance on my machine is:
0.0035216808319091797
0.00608062744140625
Why is that?
Thank you in advance.