I have 2 dataframes df1 and df2 like this:
df1:
Id    Name      
1    Tuy Hòa      
2    Kiến thụy 
3    Bình Tân
df2:
code    name      
A1     Tuy Hoà    
A2     Kiến Thụy  
A3     Tân Bình 
Now when I use merge:
out_df = pd.merge(df1, df2, left_on=['Name'], right_on=['name'], how='outer')
Of course the result is null, because example for Tuy Hòa <> Tuy Hoà (problem is the position sign "`"),...
Then I use fuzzy function and apply lambda but it's not correct at all :(
My function:
def checker(Name,names):
    correct_name = ''
    correct_ratio = 0
    if Name in names:
        ratio = 100
    else:  
        try:
            x = process.extractOne(Name, names, scorer=fuzz.token_set_ratio, score_cutoff = 80)
            correct_name = x[0]
            #correct_ratio = x[1]
        except:
            correct_name = ''
            #correct_ratio = 0
    return correct_name
Then I convert:
names = df2['name'].tolist()
After that I use apply to loop through, compare and output the correct (df1 must base on df2 to correct)
df1['Name'] = df1['Name'].apply(lambda x: checker(x,names) if not checker(x,names) else x)
The output now is:
df1
Id    Name      
1    Tuy Hòa      
2    Kiến Thụy 
3    Tân Bình
Seem like Tuy Hòa and Kiến Thụy are correct but Bình Tân and Tân Bình are different name! so changing position charaters in this case is correct not error in typing :( The expect output should be:
df1
Id    Name      
1    Tuy Hòa      
2    Kiến Thụy 
3    Bình Tân
I try to use fuzz.ratio with score_cutoff = 55 but some case this will miss pairs
Any solution is better than my solution? (maybe not using fuzzy library?) Please help me. Many thanks to your help <3
 
     
     
    