I am comparing sequential and parallel performance(using ForkJoinPool) of an algorithm(sum of first n numbers):
public class ForkJoinSumCalculator extends RecursiveTask<Long> {
    private static final ForkJoinPool FORKJOINPOOL = new ForkJoinPool();
    private final long[] numbers;
    private final int start;
    private final int end;
    public static final long THRESHOLD = 10_000;       
    public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        int numLoops = 40;
        for(int i = 1; i <= numLoops; i++) {
            ForkJoinSumCalculator forkJoinSumCalculator = new ForkJoinSumCalculator(LongStream.rangeClosed(1, 100000000).toArray());
            FORKJOINPOOL.invoke(forkJoinSumCalculator);
        }
        System.out.println("Total time parallel:"+ (System.currentTimeMillis() - startTime));
        startTime = System.currentTimeMillis();
        for(int i = 1; i <= numLoops ; i++) {
            long seqSum = 0L;
            for(int j = 1; j <= 100000000 ; j++) {
                seqSum = seqSum + j;
            }
        }
        System.out.println("Total time sequential:"+ (System.currentTimeMillis() - startTime));
    }
    public ForkJoinSumCalculator(long[] numbers) {
        this(numbers, 0, numbers.length);
    }
    private ForkJoinSumCalculator(long[] numbers, int start, int end) {
        this.numbers = numbers;
        this.start = start;
        this.end = end;
    }
    @Override
    protected Long compute() {
        ....splitting the task
        ....or calculating the sum if size is less than THRESHOLD 
    }
}
I tried to vary numLoops for a wide range of values, but always sequential approach perform better and that too by order of 3-4.
Shouldn't parallel version perform better here given that array size is not that small.