The aim is to do correlation/convolutions(flip) of two 2D arrays using ios Accelerate framework for gaining speed.
- My first attempt was with - vImageConvolve_PlanarF/- vdsp_imgfirwhich was good for lower sized arrays. But as array size increased, performance dropped drastically as it was an O(n2) implementation as mentioned by Accelerate developers themselves here(1).
- I moved to FFT implementation(2) for reducing complexity to O(nlog2n). Using - vDSP_fft2d_zip, gained speed to an extent. But using- vDSP_fft2d_zipon 2D arrays at non powers of 2, we need to pad zeros. For e.g. on a 2D array of size 640 * 480, we need to pad zeros to make it 1024 * 512. Other FFT implementations like FFTW or OpenCV's DFT allow sizes which could be expressed as size = 2p * 3p * 5r. That allows, FFTW/OpenCV to do FFT of 640 * 480 2D array at the same size.
So for 2D arrays at size 640*480, in an Accelerate vs FFTW/OpenCV comparison, it is effectively between 1024*512 and 640*480 dimensions. So what ever performance gains I get from Accelerate's FFT on 2D arrays is nullified by it's inability to performs FFT at reasonable dimensions like size = 2p * 3p * 5r
2 Queries.
- Am I missing any Accelerate functionality to perform this easily ? For e.g. any Accelerate function which could perform 2D array FFT at size = 2p * 3p * 5r. I assume vDSP_DFT_Executeperforms only 1D FFT.
- Better approaches to 2D FFT or correlation. Like in this answer(3), which asks to split arrays like 480 = 256 + 128 + 64 + 32 with repeated 1D FFT's over rows and then over columns. But this will need too many functions calls and I assume, will not help with performance.
Of lesser importance: I am doing correlation in tiles as one of the 2D arrays is far bigger then another. Say like 1920*1024 vs 150*100.
 
     
    