I am trying to test the performance of Aparapi. I have seen some blogs where the results show that Aparapi does improve the performance while doing data parallel operations.
But I am not able to see that in my tests. Here is what I did, I wrote two programs, one using Aparapi, the other one using normal loops.
Program 1: In Aparapi
import com.amd.aparapi.Kernel;
import com.amd.aparapi.Range;
public class App 
{
    public static void main( String[] args )
    {
        final int size = 50000000;
        final float[] a = new float[size];
        final float[] b = new float[size];
        for (int i = 0; i < size; i++) {
           a[i] = (float) (Math.random() * 100);
           b[i] = (float) (Math.random() * 100);
        }
        final float[] sum = new float[size];
        Kernel kernel = new Kernel(){
           @Override public void run() {
              int gid = getGlobalId();
              sum[gid] = a[gid] + b[gid];
           }
        };
        long t1 = System.currentTimeMillis();
        kernel.execute(Range.create(size));
        long t2 = System.currentTimeMillis();
        System.out.println("Execution mode = "+kernel.getExecutionMode());
        kernel.dispose();
        System.out.println(t2-t1);
    }
}
Program 2: using loops
public class App2 {
    public static void main(String[] args) {
        final int size = 50000000;
        final float[] a = new float[size];
        final float[] b = new float[size];
        for (int i = 0; i < size; i++) {
           a[i] = (float) (Math.random() * 100);
           b[i] = (float) (Math.random() * 100);
        }
        final float[] sum = new float[size];
        long t1 = System.currentTimeMillis();
        for(int i=0;i<size;i++) {
            sum[i]=a[i]+b[i];
        }
        long t2 = System.currentTimeMillis();
        System.out.println(t2-t1);
    }
}
Program 1 takes around 330ms whereas Program 2 takes only around 55ms. Am I doing something wrong here? I did printout the execution mode in Aparpai program and it prints that the mode of execution is GPU
 
     
     
    