I have a question regarding Python multiprocessing.
I have a large csv file: test.csv with 2 million lines and 2 columns:firm_id,product_id,in which the last one, namely product_id is a input of another function,say func1(product_id).
That's all about the basic information, as the file is very large and each product_id can be processed independently, I want to utilize Python's multiprocessing, which I never touched before. After Googling for a while, I found some useful information (like this and this) but none of them enable me to complete my task. I tried with the last one and edited like shows below, but it did not work,
import itertools as IT
import multiprocessing as mp
import csv
import funcitons as fdfunc # a self defined module with function func1 in it
def worker(chunk):
    return len(chunk)  
def main():    # num_procs is the number of workers in the pool
    num_procs = 2
    # chunksize is the number of lines in a chunk
    chunksize = 10**5
    pool = mp.Pool(num_procs)
    largefile = 'test.csv'
    results = []
    with open(largefile, 'r') as f,\
    open('file_to_store_resutl.csv','a+') as res_file:
        reader = csv.reader(f)       
        for chunk in iter(lambda: list(IT.islice(reader, chunksize*num_procs)), []):
            chunk = iter(chunk)
            pieces = list(iter(lambda: list(IT.islice(chunk, chunksize)), []))
            result = pool.imap(fdfunc.func1, pieces['product_id']) #pieces['product_id'] this definitely is wrong, just to show what I want to do
            writer = csv.writer(res_file)
            for item in result:
                writer.write_row(item)
            results.append(result)
main()
Anyone know how to do what I was trying?