I have df_fruits, which is a dataframe of fruits.
index      name
1          apple
2          banana
3          strawberry
and, its market prices are in mysql database like below,
category      market      price
apple         A           1.0
apple         B           1.5
banana        A           1.2
banana        A           3.0
apple         C           1.8
strawberry    B           2.7        
...
During the iteration in df_fruits, I'd like to do some processes.
The code below is a non-parallel version.
def process(fruit):
   # make DB connection
   # fetch the prices of fruit from database
   # do some processing with fetched data, which takes a long time
   # insert the result into DB
   # close DB connection
for idx, f in df_fruits.iterrows():
    process(f)
What I want to do is to do process on each row in df_fruits in parallel, since df_fruits has plenty of rows and the table size of the market prices is quite large (fetching data takes a long time).
As you can see, the order of execution between rows does not matter and there's no sharing data.
Within iteration in df_fruits, I'm confused about where to locate `pool.map(). Do I need to split the rows before parallel execution and distribute chunks to each process? (If so, a process which finished its job earlier than other process would be idle?)
I've researched of pandarallel but I can't use it (my os is windows).
Any help would be appreciated.
 
     
     
    