Trying to figure out a way to calculate the minimum percentage match when comparing a string to a column.
Example:
Column A Column B
Key Keylime
Key Chain Status
Serious
Extreme
Key
Where
Column A Column B Column C Column D
Key Temp 100% Key
Key Chain Status 66.7% Key Ch
Ten Key Ch 100% Tenure
Extreme
Key
Tenure
To expand on this:
- Column A is the column with strings to individually match
- Column B is the reference column
- Column C provides the highest percent match the column A string has with any string in column B.
- Column D provides the word from column B associated with the highest percent match
To expand on Column C - when looking at Key Chain - the highest match to any word it has in column B is for Key Ch where 6 out of the 9 characters (including space) of Key Chain match to give a percentage match of (6/9) = 66.7%
- That being said, this isn't a deal breaker but it is something that sticks out. The logic above fails when there's no way to penalize for matches where you see an example like
Tenoccur. WhereTenhas 3 out of 3 characters that match againstTenuregiving it an inflated 100% match that I still can't think of a way to correct against.