I noticed many times that small, trivial, seemingly unrelated code changes can alter the performance characteristics of a piece of Java code, sometimes dramatically.
This happens in both JMH and hand-rolled benchmarks.
For example, in a class like this:
class Class<T> implements Interface {
    private final Type field;
    Class(ClassBuilder builder) {
        field = builder.getField();
    }
    @Override
    void method() { /* ... */ }
}
I did this code change:
class Class<T> implements Interface {
    private static Class<?> instance;
    private final Type field;
    Class(Builder builder) {
        instance = this;
        field = builder.getField();
    }
    @Override
    void method() { /* ... */ }
}
and performance changed dramatically.
This is just one example. There are other cases where I noticed the same thing.
I cannot determine what causes this. I searched the web, but found no information.
To me, it looks totally uncontrollable. Maybe it has to do with how the compiled code is laid out in memory?
I do not think it is due to false sharing (see below).
I'm developing a spinlock:
class SpinLock {
    @Contended // Add compiler option: --add-exports java.base/jdk.internal.vm.annotation=<module-name> (if project is not modular, <module-name> is 'ALL-UNNAMED')
    private final AtomicBoolean state = new AtomicBoolean();
    void lock() {
        while (state.getAcquireAndSetPlain(true)) {
            while (state.getPlain()) { // With x86 PAUSE we avoid opaque load
                Thread.onSpinWait();
            }
        }
    }
    void unlock() {
        state.setRelease(false);
    }
}
class AtomicBoolean {
    private static final VarHandle VALUE;
    static {
        try {
            VALUE = MethodHandles.lookup().findVarHandle(AtomicBoolean.class, "value", boolean.class);
        } catch (ReflectiveOperationException e) {
            throw new ExceptionInInitializerError(e);
        }
    }
    private boolean value;
    public boolean getPlain() {
        return value;
    }
    public boolean getAcquireAndSetPlain(boolean value) {
        return (boolean) VALUE.getAndSetAcquire(this, value);
    }
    public void setRelease(boolean value) {
        VALUE.setRelease(this, value);
    }
}
My hand-rolled benchmark reported 171.26ns ± 43% and a JMH benchmark reported avgt  5  265.970 ± 27.712  ns/op.
When I change it like this:
class SpinLock {
    @Contended
    private final AtomicBoolean state = new AtomicBoolean();
    private final NoopBusyWaitStrategy busyWaitStrategy;
    SpinLock() {
        this(new NoopBusyWaitStrategy());
    }
    SpinLock(NoopBusyWaitStrategy busyWaitStrategy) {
        this.busyWaitStrategy = busyWaitStrategy;
    }
    void lock() {
        while (state.getAcquireAndSetPlain(true)) {
            busyWaitStrategy.reset(); // Will be inlined
            while (state.getPlain()) {
                Thread.onSpinWait();
                busyWaitStrategy.tick(); // Will be inlined
            }
        }
    }
    void unlock() {
        state.setRelease(false);
    }
}
class NoopBusyWaitStrategy {
    void reset() {}
    void tick() {}
}
My hand-rolled benchmark reported 184.24ns ± 48% and a JMH benchmark reported avgt  5  291.285 ± 20.860  ns/op.
Even though the results of the two benchmarks are different, they both increase.
This is the JMH benchmark:
public class SpinLockBenchmark {
    @State(Scope.Benchmark)
    public static class BenchmarkState {
        final SpinLock lock = new SpinLock();
    }
    @Benchmark
    @Fork(value = 1, warmups = 1, jvmArgsAppend = {"-Xms8g", "-Xmx8g", "-XX:+AlwaysPreTouch", "-XX:+UnlockExperimentalVMOptions", "-XX:+UseEpsilonGC", "-XX:-RestrictContended"})
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    @Threads(6)
    public void run(BenchmarkState state) {
        state.lock.lock();
        state.lock.unlock();
    }
}
Do you have any ideas?
Does it happen with languages without a runtime, too?
 
    