I have data coming from two different systems which I input into two different tables in SQL Server 2016. These tables are related with a Key.
Table1
Key Name
------------------------
1 Jim Parsons
2 Steven Woz
3 Billy Jean
4 Carol Henderson
5 Sara Hauder
Table2
Key Name
---------------------------
1 Jimmy Parsons
2 Steven Wozniak
3 Bill Jean
4 Andy Smith
5 Sara Hauder (a)
What I am hoping to develop is an additional column that will tell me if the Names match. However, if the names are similar, I want to consider it a match.
In order to accomplish this I was thinking I could do something like
select
*,
case when a.Name like b.Name 1 else 0
end as Flag
from
(Table1 a
join
Table2 b on a.Key = b.Key)
But obviously this wouldn't work because even the slightest difference can cause a like statement to be false.
I'm not very familiar with how to find the longest common substring, but I don't actually care what the string is. What I'm thinking is I need to determine the average number/percentage of characters shared by the two fields and then say: if the number of characters shared between a.Name and b.Name is above or equal to the average, then 1 else 0. Not really sure how to go about this however...
AS an example: Sara Hauder should match with Sara Hauder (a), but Carol Henderson should not match with Andy Smith. Note that many Names will sometimes have some different tags at the end in parenthesis which can be filtered out if necessary. However, I am perfectly capable of adding this feature myself if you choose to ignore.