I'm writing simulation with MATLAB where I used CUDA acceleration.
Suppose we have vector x and y, matrix A and scalar variables dt,dx,a,b,c.
What I found out was that by putting x,y,A into gpuArray() before running the iteration and built-in functions, the iteration could be accelerated significantly.
However, when I tried to put variables like dt,dx,a,b,c into the gpuArray(), the program would be significantly slowed down, by a factor of over 30%. (Time increased from 7s to 11s).
Why it was not a good idea to put all the variables into the gpuArray()?
(Short comment, those scalars were multiplied together with x,y,A, and was never used during the iteration alone.)