I have a dataframe with a column of text and a column of keywords.
>>> main_df.head(3)
+-------+-----------------------------------------+---------------------------------------+    
| Index |                  Text                   |               Keywords                |    
+-------+-----------------------------------------+---------------------------------------+    
|     1 | "Here is some text"                     | ["here","text"]                       |     
|     2 | "Some red birds and blue elephants"     | ["red", "bird", "blue", "elephant"]   |    
|     3 | "Please help me with my pandas problem" | ["help", "pandas", "problem"]         |    
+-------+-----------------------------------------+---------------------------------------+    
I use itertools.combinations to make a dataframe with all possible combinations of keywords.
>>> edge_df.head(3)
+-------+--------+--------+    
| Index |  Src   |  Dst   |    
+-------+--------+--------+    
|     1 | "here" | "text" |    
|     2 | "here" | "red"  |    
|     3 | "here" | "bird" |    
+-------+--------+--------+    
I then apply a function that goes through each keyword pair and assigns a value in edge_df['weight'] which is how many times each keyword pair appear in the same piece of text/list of keywords.
>>> edge_df.head(3)
+-------+--------+--------+--------+    
| Index |  Src   |  Dst   | Weight |    
+-------+--------+--------+--------+    
|     1 | "here" | "text" |      1 |    
|     2 | "here" | "red"  |      3 |    
|     3 | "here" | "bird" |      8 |    
+-------+--------+--------+--------+    
My problem is that the code is very slow at the moment (1hr for 300 lines of short pieces of text). Below is the code I am using to get the edge_df and apply the function. Anything I can do to speed this up?
from itertools import combinations
def indexes_by_word(word1, word2):
    """
    Find the matching texts between two words.
    """
    indx1 = set(df[df['Keywords'].apply(lambda lst: word1 in lst)].index)
    indx2 = set(df[df['Keywords'].apply(lambda lst: word2 in lst)].index)
    return len(indx1.intersection(indx2))
# Make list of all unique words
unique_words = df['Keywords'].apply(pd.Series).stack().reset_index(drop=True).unique()
# Make an empty edgelist dataframe of our words
edges = pd.DataFrame(data=list(combinations(unique_words, 2)),
                     columns=['src', 'dst'])
edges['weight'] = edges.progress_apply(lambda x: indexes_by_word(x['src'], x['dst']), axis=1)
edges.head()
 
     
    