I'm trying to simulate some processes in order to get some statistics. I decided to write simulation program using multiple threads as each test run is independant.
It means that if I need to perform e.g. 1000 test runs then it should be possible to have 4 threads (each doing 250 test runs).
While doing this I found that addition of multiple threads does not decrease simulation time.
I have Windows 10 laptop with 4 physical cores.
I wrote a simple program which shows behaviour I'm talking about.
from concurrent.futures import ThreadPoolExecutor
import time
import psutil
import random
def runScenario():
    d = dict()
    for i in range(0, 10000):
        rval = random.random()
        if rval in d:
            d[rval] += 1
        else:
            d[rval] = 1
    return len(d)    
def runScenarioMultipleTimesSingleThread(taskId, numOfCycles):
    print('starting thread {}, numOfCycles is {}'.format(taskId, numOfCycles))
    sum = 0
    for i in range(numOfCycles):
        sum += runScenario()
    print('thread {} finished'.format(taskId))
    return sum
def modelAvg(numOfCycles, numThreads):
    pool = ThreadPoolExecutor(max_workers=numThreads)
    cyclesPerThread = int(numOfCycles / numThreads)
    numOfCycles = cyclesPerThread * numThreads
    futures = list()
    for i in range(numThreads):
        future = pool.submit(runScenarioMultipleTimesSingleThread, i, cyclesPerThread)
        futures.append(future)
    sum = 0
    for future in futures:
        sum += future.result()
    return sum / numOfCycles
def main():
    p = psutil.Process()
    print('cpus:{}, affinity{}'.format(psutil.cpu_count(), p.cpu_affinity() ))
    start = time.time()
    modelAvg( numOfCycles = 10000, numThreads = 4)
    end = time.time()
    print('simulation took {}'.format(end - start))
if __name__ == '__main__':
    main()
These are the results:
One thread:
cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7]
starting thread 0, numOfCycles is 10000
thread 0 finished
simulation took 23.542529582977295
Four threads:
cpus:8, affinity[0, 1, 2, 3, 4, 5, 6, 7]
starting thread 0, numOfCycles is 2500
starting thread 1, numOfCycles is 2500
starting thread 2, numOfCycles is 2500
starting thread 3, numOfCycles is 2500
thread 1 finished
thread 2 finished
thread 0 finished
thread 3 finished
simulation took 23.508538484573364
I expect that when using 4 threads simulation time should be ideally 4 times smaller, and of cause it should not be the same.