I have two data frames built using pandas with more than 13 columns each.
- In
df1,one of the columns iscompany_name_x. - In
df2, one of the columns iscompany_name_y.
Both columns in their respective frames contain plenty of company names which are strings. As output, I want to display the matching companies only if at least initial part (say 50%) of both company_name_x and company_name_y matches with each other. I am also calculating the fuzz ratio, which seems to be working fine. However, the combination of fuzz along with the above condition doesn't seem to work.
It gives indexing error:
Unalignable boolean Series key provided
Below is the code I am using -
df4 = df3[df3.Fuzz>85][df3.company_name_mod_x[0:len(df3.company_name_mod_x)/2] ==
df3.company_name_mod_y[0:len(df3.company_name_mod_y)/2]]
df3 is the frame which has the top fuzz ratio for each possible pair of df1 and df2.
Output should match companies which has fuzz > 85 (works fine) and at least the first half of both companies should match (which isn't working)