I have a task which needs to run a number of subtasks, each on their own vm, and then when all subtasks are complete, merge the results and present them back to the caller.
I have implemented this using a multiprocessing.Pool, and it's working great.
I now want to scale up, running multiple of these tasks in parallel.
My initial design was to wrap my task running in another multiprocessing.Pool, where each task runs in its process, effectively fanning out as follows:
job
+----- task_a
| +------ subtask_a1
| +------ subtask_a2
| +------ subtask_a3
+----- task_b
+------ subtask_b1
+------ subtask_b2
+------ subtask_b3
jobstarts amultiprocessing.Poolwith 2 processes, one fortask_aand one fortask_b.- in turn,
task_aandtask_beach start amultiprocessing.Poolwith 3 processes, one for each of their subtasks.
When I tried to run my code, I hit an assertion error:
AssertionError: daemonic processes are not allowed to have children
Searching online for details, I found the following thread, an excerpt of which reads:
As for allowing children threads to spawn off children of its own using subprocess runs the risk of creating a little army of zombie 'grandchildren' if either the parent or child threads terminate before the subprocess completes and returns
I have also found workarounds which allow this kind of "pool within a pool" use:
class NoDaemonProcess(multiprocessing.Process):
@property
def daemon(self):
return False
@daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
class MyPool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(MyPool, self).__init__(*args, **kwargs)
However, given the above quote about "zombie grandchildren", it seems perhaps this is not a good design.
So I guess my question is:
- What is the pythonic way to "fan out" multiple processes within multiple processes"?