I am using the multiprocessing module in Python 3.7. My code is not working as expected (see this question here). Someone suggested to set maxtasksperchild, which I set to 1. Then, while reading the documentation, I figured that it was best to set the chunksize to 1 as well. This is the relevant code part:
# Parallel Entropy Calculation
# ============================
node_combinations = [(i, j) for i in g.nodes for j in g.nodes]
pool = Pool(maxtaskperchild=1)
start = datetime.datetime.now()
logging.info("Start time: %s", start)
print("Start time: ", start)
results = pool.starmap(g._log_probability_path_ij, node_combinations, chunksize=1)
end = datetime.datetime.now()
print("End time: ", end)
print("Run time: ", end - start)
logging.info("End time: %s", end)
logging.info("Total run time: %s", start)
pool.close()
pool.join()
This backfired enormously. Setting only maxtasksperchild or only chunksize got the job done in the expected time (for a smaller dataset that I am using to test the code). Setting both just wouldn't finish and nothing was really running after a few seconds (I checked with htop to see if the cores where working).
Questions
Do
maxtasksperchildandchunksizeconflict when setting them together?Do they do the same thing?
maxtasksperchildat thePool()level andchunksizeat thePoolmethods level?
======================================================
EDIT
I understand that debugging may be impossible from the extract of code presented, please find the full code below. The modules graph and graphfile are just little libraries written by me available in GitHub. If you wish to run the code, you can use any of the files in the data/ directory in the mentioned GitHub repository. Short tests are better run using F2, but F1 and F3 are the ones causing trouble in the HPC.
import graphfile
import graph
from multiprocessing.pool import Pool
import datetime
import logging
def remove_i_and_f(edges):
new_edges = dict()
for k,v in edges.items():
if 'i' in k:
continue
elif 'f' in k:
key = (k[0],k[0])
new_edges[key] = v
else:
new_edges[k] = v
return new_edges
if __name__ == "__main__":
import sys
# Read data
# =========
graph_to_study = sys.argv[1]
full_path = "/ComplexNetworkEntropy/"
file = graphfile.GraphFile(full_path + "data/" + graph_to_study + ".txt")
edges = file.read_edges_from_file()
# logging
# =======
d = datetime.date.today().strftime("%Y_%m_%d")
log_filename = full_path + "results/" + d + "_probabilities_log_" + graph_to_study + ".log"
logging.basicConfig(filename=log_filename, level=logging.INFO, format='%(asctime)s === %(message)s')
logging.info("Graph to study: %s", graph_to_study)
logging.info("Date: %s", d)
# Process data
# ==============
edges = remove_i_and_f(edges)
g = graph.Graph(edges)
# Parallel Entropy Calculation
# ============================
node_combinations = [(i, j) for i in g.nodes for j in g.nodes]
pool = Pool(maxtasksperchild=1)
start = datetime.datetime.now()
logging.info("Start time: %s", start)
print("Start time: ", start)
results = pool.starmap(g._log_probability_path_ij, node_combinations, chunksize=1)
end = datetime.datetime.now()
print("End time: ", end)
print("Run time: ", end - start)
logging.info("End time: %s", end)
logging.info("Total run time: %s", start)
pool.close()
pool.join()