I am trying to parallelize a piece of code using the multiprocessing library, the outlines of which are as follows: 
import multiprocessing as mp
import time
def stats(a,b,c,d,queue):
   t = time.time()
   x = stuff
   print(f'runs in {time.time() -t} seconds')
   queue.put(x)
def stats_wrapper(a,b,c,d):
   q = mp.Queue()
   # b here is a dictionary
   processes = [mp.Process(target = stats,args = (a,{b1:b[b1]},c,d,q) for b1 in b]
   for p in processes:
     p.start()
   results = []
   for p in processes:
     results.append(q.get())
   return(results)
The idea here is to parallelize the calculations in stuff by creating as many processes as there are key:value pairs in b,which can go upto 50. The speedup I'm getting is only about 20% as compared to the serial implementation. 
I tried rewriting the stats_wrapper function with mp.Pool , but couldn't get the apply.async function to work with different args as I have created in stats_wrapper above. 
Questions :-
- How can I improve the stats_wrapperfunction to bring down the runtime further?
- Is there a way to rewrite the - stats_wrapperusing- mp.Poolfunction that can accept a different value for parameter- bin the- statsfunction for different processes? Something like :- def pool_stats_wrapper(a,b,c,d): q = mp.Queue pool = mp.Pool() pool.apply_async(stats,((a,{b1:b[b1},c,d) for b1 in b)) # Haven't figured this part yet pool.close() results = [] for _ in len(b): results.append(q.get()) return (results)
I'm running the code on an 8 core 16 GB machine if that helps.
