I have tried to test if using var & 3 is faster than var % 4 in java (it could also be & 2^n - 1 vs. % 2^n). I have made a simple program to calculate the average time it takes to do the calculations, but I get strange results and I can't conclude. For about 1000 calculations, the average is that mod 4 takes much more time, but when I try with about 1000000 calculations, both averages are about the same... I suspect this is due to java optimization of my code, but I am not sure. 
Which of those two operations is supposed to be faster, and how is % implemented?
Thanks!
EDIT: Here is my test program.
    long startTime, time, sum;
    int iterations = 1000;
    int v;
    sum = 0;
    for(int i = 0; i < iterations; i++)
    {
        startTime = System.nanoTime();
        v = i % 4;
        time = System.nanoTime();
        sum += time-startTime;
    }
    System.out.println("Mod 4 : "+(sum/iterations));
    sum = 0;
    for(int i = 0; i < iterations; i++)
    {
        startTime = System.nanoTime();
        v = i & 3;
        time = System.nanoTime();
        sum += time-startTime;
    }
    System.out.println("& 3 : "+(sum/iterations));
With 100 iterations, I get 130 nanosec for mod 4 and 25060 nanosec for & 3.
For 1000 iterations, I get 1792 nanosec for mod 4 and 81 nanosec for & 3. 
With 1000000 iterations, I get about 50 nanosec for both, while having mod 4 always a few nanosec longer. 
 
    