I'm struggling to use multithreading for calculating relatedness between list of customers who have different shopping items on their baskets. So I have a pandas data frame consists of 1,000 customers, which means that I have to calculate the relatedness 1 million times and this takes too long to process
An example of the data frame looks like this:
  ID     Item       
    1    Banana    
    1    Apple     
    2    Orange    
    2    Banana    
    2    Tomato    
    3    Apple     
    3    Tomato    
    3    Orange    
Here is the simplefied version of the code:
import pandas as pd
def relatedness (customer1, customer2):
    # do some calculations to measure the relation between the customers
data= pd.read_csv(data_file)
customers_list= list (set(data['ID']))
relatedness_matrix = pd.DataFrame(index=[customers_list], columns=[customers_list])
for i in customers_list:
    for j in customer_list:
        relatedness_matrix.loc[i,j] = relatedness (i,j)
 
     
     
     
    