I have a StressTester class like this:
public abstract class StressTest {
  public static final int WARMUP_JIT_COMPILER = 10000;
  public interface TimedAction {
    void doAction();
  }
  public static long timeAction(int numberOfTimes, TimedAction action) {
    ThreadMXBean bean = ManagementFactory.getThreadMXBean();
    for (int i = 0; i < WARMUP_JIT_COMPILER; i++) {
      action.doAction();
    }
    long currentTime = bean.getCurrentThreadCpuTime();
    for (int i = 0; i < numberOfTimes; i++) {
      action.doAction();
    }
    return (bean.getCurrentThreadCpuTime() - currentTime)/1000000;
  }
}
And a main method looks something like this:
private static boolean isPrime1(int n) { ... }
private static boolean isPrime2(int n) { ... }
private static boolean isPrime3(int n) { ... }
private static boolean isPrime4(int n) { ... }
private static final int NUMBER_OF_RUNS = 1000000;
public static void main(String[] args) {
  long primeNumberFinderTime1 = StressTest.timeAction(NUMBER_OF_RUNS, () -> {
    for (int i = 0; i < 100; i++) {
      isPrime1(i);
    }
  });
  long primeNumberFinderTime2 = StressTest.timeAction(NUMBER_OF_RUNS, () -> {
    for (int i = 0; i < 100; i++) {
      isPrime2(i);
    }
  });
  long primeNumberFinderTime3 = StressTest.timeAction(NUMBER_OF_RUNS, () -> {
    for (int i = 0; i < 100; i++) {
      isPrime3(i);
    }
  });
  long primeNumberFinderTime4 = StressTest.timeAction(NUMBER_OF_RUNS, () -> {
    for (int i = 0; i < 100; i++) {
      isPrime4(i);
    }
  });
}
When I have it set up like that then the results are pretty much as expected, and I can swap the tests and the results swap as expected. isPrime3 is about 200 times faster than isPrime1.
My real code is a bit more complex. I have several classes that find prime numbers like this:
class PrimeNumberFinder1 {
  @Override
  bool isPrime(i) { /* same code as in static isPrime1() */ };
}
class PrimeNumberFinder2 extends PrimeNumberFinder1 {
  @Override
  bool isPrime(i) { /* same code as in static isPrime2() */ };
}
class PrimeNumberFinder3 extends PrimeNumberFinder1 {
  @Override
  bool isPrime(i) { /* same code as in static isPrime3() */ };
}
class PrimeNumberFinder4 extends PrimeNumberFinder1 {
  @Override
  bool isPrime(i) { /* same code as in static isPrime4() */ };
}
And I have a class like this:
class SomeClassWithPrimeNumberFinder {
  PrimeNumberFinder1 _pnf;
  void setPrimeNumberFinder(PrimeNumberFinder1 pnf) {
    _pnf = pnf;
  }
  void stressTest() {
    StressTest.doAction(10000000, () -> {
      for (int i = 0; i < 100; i++) {
        _pnf.isPrime(i);
      }
    });
  }
}
And my main method:
public static void main(String() args) {
  SomeClassWithPrimeNumberFinder sc = new SomeClassWithPrimeNumberFinder();
  sc.setPrimeNumberFinder(new PrimeNumberFinder1());
  sc.stressTest();
  sc.setPrimeNumberFinder(new PrimeNumberFinder2());
  sc.stressTest();
  sc.setPrimeNumberFinder(new PrimeNumberFinder3());
  sc.stressTest();
  sc.setPrimeNumberFinder(new PrimeNumberFinder4());
  sc.stressTest();
}
With this setup PrimeNumberFind1 is about as fast as isPrime1() in the first test. But PrimeNumberFind3 is about 200 times slower than isPrime3() in the first test.
If I move PrimeNumberFind3 so it runs first, I get the same times as isPrime3() in the first test. The rest of the times are a bit slower too (5-10%), but nothing like PrimeNumberFind3.
The first 3 PrimeNumberFind's are just loops and ifs. No state involved. The last one has a constructor that creates a lookup list, but is just a simple loop as well. If I take the code out of the constructor and create the lookup list with an array literal, the timing is identical.
Any ideas why this is happening?
 
     
    