pyfftw, wrapping the FFTW library, is likely faster than the FFTPACK library wrapped by np.fft and scipy.fftpack.
After all, FFTW stands for Fastest Fourier Transform in the West.
The minimal code is:
import numpy as np
import pyfftw
import multiprocessing
b = np.random.random((100, 256, 256))
bb = pyfftw.empty_aligned((100,256, 256), dtype='float64')
bf= pyfftw.empty_aligned((100,256, 129), dtype='complex128')
fft_object_b = pyfftw.FFTW(bb, bf,axes=(1,2),flags=('FFTW_MEASURE',), direction='FFTW_FORWARD',threads=multiprocessing.cpu_count())
bb=b
fft_object_b(bb)
Here is an extended code timing the execution of np.fft and pyfftw:
import numpy as np
from timeit import default_timer as timer
import multiprocessing
a = np.random.random((256, 256))
b = np.random.random((100, 256, 256))
start = timer()
for i in range(10):
np.fft.fft2(a)
end = timer()
print"np.fft.fft2, 1 slice", (end - start)/10
start = timer()
for i in range(10):
bf=np.fft.fftn(b, axes=(1, 2,))
end = timer()
print "np.fft.fftn, 100 slices", (end - start)/10
print "bf[3,42,42]",bf[3,42,42]
import pyfftw
aa = pyfftw.empty_aligned((256, 256), dtype='float64')
af= pyfftw.empty_aligned((256, 129), dtype='complex128')
bb = pyfftw.empty_aligned((100,256, 256), dtype='float64')
bf= pyfftw.empty_aligned((100,256, 129), dtype='complex128')
print 'number of threads:' , multiprocessing.cpu_count()
fft_object_a = pyfftw.FFTW(aa, af,axes=(0,1), flags=('FFTW_MEASURE',), direction='FFTW_FORWARD',threads=multiprocessing.cpu_count())
fft_object_b = pyfftw.FFTW(bb, bf,axes=(1,2),flags=('FFTW_MEASURE',), direction='FFTW_FORWARD',threads=multiprocessing.cpu_count())
aa=a
bb=b
start = timer()
for i in range(10):
fft_object_a(aa)
end = timer()
print "pyfftw, 1 slice",(end - start)/10
start = timer()
for i in range(10):
fft_object_b(bb)
end = timer()
print "pyfftw, 100 slices", (end - start)/10
print "bf[3,42,42]",bf[3,42,42]
Finally, the outcome is a significant speed up: pyfftw proves 10 times faster than np.fft on my computer., using 2 threads.
np.fft.fft2, 1 slice 0.00459032058716
np.fft.fftn, 100 slices 0.478203487396
bf[3,42,42] (-38.190256258791734+43.03902512127183j)
number of threads: 2
pyfftw, 1 slice 0.000421094894409
pyfftw, 100 slices 0.0439268112183
bf[3,42,42] (-38.19025625879178+43.03902512127183j)
Your computer seems much better than mine!