I was experimented with multithreading and parallel programming. According to my current knowledge, for intensive cpu computations, the optimal thread number is equal to the number of cores, otherwise u'll have an overhead regarding context switch between threads. I am not involving hyperthreading in the game where you can double the logical processors. I tried to play with a parallel quicksort on a 4 core cpu with disabled hyperthreading.
This is a result set that made me a bit curious.
I am a bit confused reading numerous of answers why this can happen.
I can understand that 8 threads have no significant performance, but why they compute faster than 4 threads ?
CPU Cores : 4
10 tests ran with 1 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 3579
10 tests ran with 2 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 2458
10 tests ran with 4 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1230
10 tests ran with 8 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1030
10 tests ran with 16 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1026
10 tests ran with 32 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1036
10 tests ran with 64 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1094
JMH results on 30million sample array with 10 iterations
Benchmark            Mode  Cnt     Score     Error  Units
Sample.run16Threads  avgt   10  1057.336 ±  38.845  ms/op
Sample.run1Thread    avgt   10  3385.159 ±  15.619  ms/op
Sample.run2Threads   avgt   10  1809.542 ±  23.794  ms/op
Sample.run32Threads  avgt   10  1133.388 ±  86.976  ms/op
Sample.run4Threads   avgt   10  1120.386 ± 113.688  ms/op
Sample.run64Threads  avgt   10  1546.910 ± 488.938  ms/op
Sample.run8Threads   avgt   10  1031.181 ±  52.650  ms/op
public class ParallelQuickSort extends RecursiveAction {
    private final int left;
    private final int right;
    private final int[] arr;
    public ParallelQuickSort(int left, int right, int[] arr) {
        this.left = left;
        this.right = right;
        this.arr = arr;
    }
    private int partition(int[] arr, long pivot) {
        int leftPtr = left - 1;
        int rightPtr = right;
        while ( true ) {
            while ( arr[++leftPtr] < pivot ) ;
            while ( rightPtr > 0 && arr[--rightPtr] > pivot ) ;
            if ( leftPtr >= rightPtr ) {
                break;
            } else {
                swap( arr, leftPtr, rightPtr );
            }
        }
        swap( arr, leftPtr, right );
        return leftPtr;
    }
    private void swap(int[] arr, int x, int y) {
        int temp = arr[x];
        arr[x] = arr[y];
        arr[y] = temp;
    }
    @Override
    protected void compute() {
        if ( right - left <= 0 ) {
            return;
        }
        long pivot = arr[right];
        int partition = partition( arr, pivot );
        ForkJoinTask.invokeAll( createSubTasks( partition ) );
    }
    private List<ParallelQuickSort> createSubTasks(int partition) {
        return List.of(
                new ParallelQuickSort( left, partition - 1, arr ),
                new ParallelQuickSort( partition + 1, right, arr )
        );
    }
}
public class Main {
    public static void main(String[] args) {
        SecureRandom secureRandom = new SecureRandom();
        int[] arr = IntStream.range( 0, 30_000_000 ).map( i -> secureRandom.nextInt( 200_000 ) ).toArray();
        int tests = 10;
        System.out.println( "CPU Cores : " + Runtime.getRuntime().availableProcessors() + "\n" );
        for ( int threads = 1; threads <= 64; threads *= 2 ) {
            long time = 0;
            for ( int i = 0; i < tests; ++i ) {
                int[] sample = Arrays.copyOfRange( arr, 0, arr.length );
                ForkJoinPool forkJoinPool = new ForkJoinPool(threads);
                long start = System.currentTimeMillis();
                forkJoinPool.invoke( new ParallelQuickSort( 0, arr.length - 1, sample ) );
                long end = System.currentTimeMillis();
                forkJoinPool.shutdown();
                time += end - start;
                //System.out.println( isSorted( sample ) );
            }
            System.out.println( tests + " tests ran with " + threads +
                    " thread(s), on an array of size : " + arr.length + ". Sorting process took on avg for each test : " + time / tests );
        }
    }
    private static boolean isSorted(int[] arr) {
        boolean isSorted = true;
        for ( int i = 0; i < arr.length - 1; ++i ) {
            if ( arr[i] > arr[i + 1] ) {
                isSorted = false;
                break;
            }
        }
        return isSorted;
    }
}
