Say I have a pandas dataframe that looks like this:
ID    String1                         String2
1     The big black wolf              The small wolf
2     Close the door on way out       door the Close
3     where's the money               where is the money
4     123 further out                 out further
I want to cross tab each row in columns String1 and String2, before doing a fuzzy string matching, similar to Python fuzzy string matching as correlation style table/matrix.
My challenge is that the solution in the link I posted only works when the number of words in String1 and String2 are same. Secondly that solution looks at all the rows in the column while I want mine to only do a row by row comparison.
Proposed solution should do a matrix like comparison for row 1 like:
       string1     The  big  black  wolf  Maximum
       string2
       The          100  0    0      0     100
       small        0    0    0      0     0
       wolf         0    0    0      100   100
ID    String1                         String2               Matching_Average
1     The big black wolf              The small wolf        66.67
2     Close the door on way out       door the Close
3     where's the money               where is the money
4     123 further out                 out further
where matching average is the sum of 'maximum' column divided by the number of words in String2