I've got a function foo which takes a small object and a large one big_object. The large one is a constant. I'm using multiprocessing to process a list of the small objects. I want to avoid having to pickle/unpickle big_object each time foo is called.
It seems like the initialiser argument of multiprocessing.Pool would be useful for me. But I can't figure it out (memory explodes). My approach at the moment looks like:
big_object = None
def foo(small_object):
   global big_object
   # ... do stuff with big_object
   return result
def init(big_object_arg):
   global big_object
   big_object = big_object_arg
def main():
   [...]
   with mp.Pool(4, initializer=init, initargs=(big_object,)) as pool:
       lst_results = pool.map(foo, lst_small_objects)
This runs, but memory usage explodes for some reason. Why could this be happening?
big_object is a custom C++ object defined via pybind11 for which I have defined pickling functions. These are very slow though.
