I am trying to select segments/ clauses of sentences, based on word pairs with which the segments should start. For example, I am interested in sentence segments that start with "what does" or "what is', etc.
To do this, I am looping over two DataFrames, using an if statement inside a for loop as shown below. The first DataFrame df1['Sentence'] contains the sentences. The other df2['First2'] contains the pairs of starting words. However, the function seems to loop only over the first word pair in the for loop - after the first item, it does not return to the for loop. My code seems to work when I would pass lists to it, but not when I pass DataFrames. I have tried the solutions mentioned in Pythonic way to combine FOR loop and IF statement. But they do not work for my DataFrame. I would love to know how to solve this.
DataFrames:
'Sentence' 'First2'
0 If this is a string what does it say? 0 what does
1 And this is a string, should it say more? 1 should it
2 This is yet another string. 2
My code looks as follows:
import pandas as pd
a = df1['Sentence']
b = df2['First2']
#The function seems to loop over all r's but not over all b's:
def func(r):
for i in b:
if i in r:
# The following line selects the sentence segment that starts with
# the words in `First2`, up to the end of the sentence.
q = r[r.index(i):]
return q
else:
return ''
df1['Clauses'] = a.apply(func)
This is the result:
what does it say?
This is correct but incomplete. The code seems to loop over all r's but not over all b's. How to get the desired result, as below?
what does it say?
should it say more?