I'm trying to run a program external to Python with multithreading using this code:
def handle_multiprocessing_pool(num_threads: int, partial: Callable, variable: list) -> list:
    progress_bar = TqdmBar(len(variable))
    with multiprocessing.pool.ThreadPool(num_threads) as pool:
        jobs = [
            pool.apply_async(partial, (value,), callback=progress_bar.update_progress_bar)
            for value in variable
        ]
        pool.close()
        processing_results = []
        for job in jobs:
            processing_results.append(job.get())
        pool.join()
    return processing_results
The Callable being called here loads an external program (with a C++ back-end), runs it and then extracts some data. Inside its GUI, the external program has an option to run cases in parallel, each case is assigned to a thread, from which I assumed it would be best to work with multithreading (instead of multiprocessing).
The script is running without issues, but I cannot quite manage to utilize the CPU power of our machine efficiently. The machine has 64 cores with 2 threads each. I will list some of my findings about the CPU utilisation.
- When I run the cases from the GUI, it manages to utilize 100% CPU power. 
- When I run the script on 120 threads, it seems like only half of the threads are properly engaged: 
- The external program allows me to run on two threads, however if I run 60 parallel processes on 2 threads each, the utilisation looks similar. 
- When I run two similar scripts on 60 threads each, the full CPU power is properly used: 
I have read about the Global Interpreter Lock in Python, but the multiprocessing package should circumvent this, right? Before test #4, I was assuming that for some reason the processes were still running on cores and the two threads on each were not able to run concurrently (this seems suggested here: multiprocessing.Pool vs multiprocessing.pool.ThreadPool), but especially the behaviour from #4 above is puzzling me.
I have tried the suggestions here Why does multiprocessing use only a single core after I import numpy? which unfortunately did not solve the problem.


