I have a for loop that gets slower over time ( x10 slower). The loop iterates over a very large corpus of tweets (7M) to find keywords passed through a dictionary. If the keywords is in the tweet, a df is updated.
for n, sent in enumerate(corpus):
    for i, word in words['token'].items():
        tag_1 = words['subtype_I'][i]
        tag_2 = words['subtype_II'][i]
        if re.findall(word, sent):
            df = pd.DataFrame([[sent, tag_1, tag_2, word]], columns=['testo', 'type',
                                                                     'type_2','trigger'])
            data = data.append(df)
            print(n)
        else:
            continue
It starts processing 1000 lines per second more or less, after 900K iterations it slows down to 100.
What I'm missing here? Memory allocation problem? Is there a way to speed this up?
 
     
    