I have a function f(x) I want to evaluate over list of values xrange in parallel. The function does something like this:
def f(x, wrange, dict1, dict2):
out_list = []
v1 = dict1[x]
for w in wrange:
v2 = dict2[x-w]
out_list += [np.dot(v1, v2)]
return out_list
it takes values a matrix from a dictionary dict1, a vector from dictionary dict2, then multiplies them together. Now my normal approach for doing this in parallel would be something like this:
import functools
import multiprocessing
par_func = functools.partial(f, wrange=wrange, dict1=dict1, dict2=dict2)
p = multiprocessing.Pool(4)
ssdat = p.map(par_func, wrange)
p.close()
p.join()
Now when dict1 and dict2 are big dictionaries, this causes the code to fail with the error
File "/anaconda3/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
and I think this is because pool is making copies of the dict1 and dict2 for every evaluation of my function. Is there an efficient way, instead, to set these dictionaries as shared memory objects? Is map the best function to do this?