Shared memory parallelization

Question

I have a vector v with size n and I need to increment by 1 each entry using this code:

 for output_diff in results:
     for i in n:
         if (output_diff & (1 << i)):
             v[i] += 1

Size of results is approximately 10 000 000 and size of n = 4096. How can I do that using parallelism or maybe multiprocessing in python? I tried using the idea in How to implement a reduce operation in python multiprocessing? , but it takes longer than serial way.

This isn't quite the case. multiple python processes can execute simultaneously. "The main alternative provided in the standard library for CPU bound applications is the multiprocessing module, which works well for workloads that consist of relatively small numbers of long running computational tasks, but results in excessive message passing overhead if the duration of individual operations is short" - http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html — DerekG, Jan 22 '21 at 21:30

score 0 · Answer 1 · answered Jan 22 '21 at 21:37

If your operation is taking a long time (say maybe 30 seconds or longer), then you could perhaps benefit from dividing results into as many pieces as you want to run python processes, and using python's multiprocessing module. If the operation isn't taking that long, the overhead of starting these new processes will outweigh the benefit of using them.

Since the operation being carried out does not depend on the values stored in v, each process can write to an independent vector and you can aggregate the results at the end. Pass each process a vector v_prime of 0's of the same length as v. Perform the above operation, each process handling a portion of the output_diffs in results and incrementing the corresponding values in v_prime instead of v. Then at the end, each process returns its vector v_prime. Sum all of the returned v_primes and the original v (this is where having the items expressed as numpy arrays is helpful, as it is easy to add numpy vectors of the same length) to get the correct result.

Shared memory parallelization

1 Answers1