So I have an algorithm I am writing, and the function multiprocess is supposed to call another function, CreateMatrixMp(), on as many processes as there are cpus, in parallel. I have never done multiprocessing before, and cannot be certain which one of the below methods is more efficient. The word "efficient" being used in the context of the function CreateMatrixMp() needing to potentially be called thousands of times.I have read all of the documentation on the python multiprocessing module, and have come to these two possibilities:
First is using the Pool class:
def MatrixHelper(self, args):
return self.CreateMatrix(*args)
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
poolCount = cpus*2
args = [(sigmaI, sigmaX, i) for i in range(self.numPixels)]
pool = mp.Pool(processes = poolCount, maxtasksperchild= 2)
tempData = pool.map(self.MatrixHelper, args)
pool.close()
pool.join()
And next is using the Process class:
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
processes = [mp.Process(target = self.CreateMatrixMp, args = (sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
for p in processes:
p.start()
for p in processes:
p.join()
Pool seems to be the better choice. I have read that it causes less overhead. And Process does not consider the number of cpus on the machine. The only problem is that using Pool in this manner gives me error after error, and whenever I fix one, there is a new one underneath it. Process seems easier to implement, and for all I know it may be the better choice. What does your experience tell you?
If Pool should be used, then am I right in choosing map()? It would be preferred that order is maintained. I have tempData = pool.map(...) because the map function is supposed to return a list of the results of every process. I am not sure how Process handles its returned data.