Is it efficient to calculate many results in parallel with multiprocessing.Pool.map() in a situation where each input value is large (say 500 MB), but where input values general contain the same large object? I am afraid that the way multiprocessing works is by sending a pickled version of each input value to each worker process in the pool. If no optimization is performed, this would mean sending a lot of data for each input value in map(). Is this the case? I quickly had a look at the multiprocessing code but did not find anything obvious.
More generally, what simple parallelization strategy would you recommend so as to do a map() on say 10,000 values, each of them being a tuple (vector, very_large_matrix), where the vectors are always different, but where there are say only 5 different very large matrices?
PS: the big input matrices actually appear "progressively": 2,000 vectors are first sent along with the first matrix, then 2,000 vectors are sent with the second matrix, etc.