Simple question: all tutorials I've read show you how to output the result of a parallel computation to a list (or at best a dictionary) using either ipython.parallel or multiprocessing.
Could you point me to a simple example of outputing the result of a computation to a shared pandas dataframe using either libraries?
http://gouthamanbalaraman.com/blog/distributed-processing-pandas.html - this tutorial show you how to read the input dataframe (code below), but then how would I output the results of the 4 parallel computations to ONE dataframe please?
import pandas as pd
import multiprocessing as mp
LARGE_FILE = "D:\\my_large_file.txt"
CHUNKSIZE = 100000 # processing 100,000 rows at a time
def process_frame(df):
        # process data frame
        return len(df)
if __name__ == '__main__':
        reader = pd.read_table(LARGE_FILE, chunksize=CHUNKSIZE)
        pool = mp.Pool(4) # use 4 processes
        funclist = []
        for df in reader:
                # process each data frame
                f = pool.apply_async(process_frame,[df])
                funclist.append(f)
        result = 0
        for f in funclist:
                result += f.get(timeout=10) # timeout in 10 seconds
        print "There are %d rows of data"%(result)
 
     
    