I've tried to optimize a filling of square two-dimensional Java array with sums of indices at each element by computing each sum once for two elements, opposite relative to the main diagonal. But instead of speedup or, at least, comparable performance, I've got 23(!) times slower code.
My code:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(ArrayFill.N * ArrayFill.N)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class ArrayFill {
    public static final int N = 8189;
    public int[][] g;
    @Setup
    public void setup() { g = new int[N][N]; }
    @GenerateMicroBenchmark
    public int simple(ArrayFill state) {
        int[][] g = state.g;
        for(int i = 0; i < g.length; i++) {
            for(int j = 0; j < g[i].length; j++) {
                g[i][j] = i + j;
            }
        }
        return g[g.length - 1][g[g.length - 1].length - 1];
    }
    @GenerateMicroBenchmark
    public int optimized(ArrayFill state) {
        int[][] g = state.g;
        for(int i = 0; i < g.length; i++) {
            for(int j = 0; j <= i; j++) {
                g[j][i] = g[i][j] = i + j;
            }
        }
        return g[g.length - 1][g[g.length - 1].length - 1];
    }
}
Benchmark results:
Benchmark               Mode     Mean   Mean error    Units
ArrayFill.simple        avgt    0.907        0.008    ns/op
ArrayFill.optimized     avgt   21.188        0.049    ns/op
The question:
How could so tremendous performance drop be explained?
P. S. Java version is 1.8.0-ea-b124, 64-bit 3.2 GHz AMD processor, benchmarks were executed in a single thread.
 
     
     
     
    