My issue here is joining two disparate name fields (without so many exceptions) using python/pandas.
I have two dataframes where I would like to merge on 'short_name' or 'long_name' of df 1 and 'name' of df2.
df 1:
           short_name                            long_name  age  height_cm  \
0            L. Messi       Lionel Andrés Messi Cuccittini   32        170   
1   Cristiano Ronaldo  Cristiano Ronaldo dos Santos Aveiro   34        187   
2           Neymar Jr        Neymar da Silva Santos Junior   27        175   
3            J. Oblak                            Jan Oblak   26        188   
4           E. Hazard                          Eden Hazard   28        175   
5        K. De Bruyne                      Kevin De Bruyne   28        181
df 2:
                      name     2014      2015      2016      2017      2018  \
0             Kylian Mbappé      NaN    0.0570    1.9238   51.3000  175.5600   
1                    Neymar   74.100   98.8000  114.0000  133.0000  205.2000   
2             Mohamed Salah   14.820   17.1000   26.6000   39.9000  144.4000   
3                Harry Kane    3.420   22.8000   41.8000   72.2000  159.6000   
4               Eden Hazard   53.010   74.1000   76.0000   82.6500  143.6400   
5              Lionel Messi  136.800  136.8000  136.8000  136.8000  197.6000 
I modified df2's 'name' column to follow the (first initial, last name convention) of df1's 'short_name' column. Unfortunately it led to many exceptions since df2's 'name' column doesn't always follow that convention (examples include, 'Neymar Jr' (expected: "Neymar"), Cristiano Ronaldo (expected: C. Ronaldo), and Roberto Firmino (expected: R. Firmino).
The only other thing I can think of is using substring matching. Is there a way to split df2's 'name' column into separate substrings and then see if df1's 'long_name' contains all of those elements (ie seeing if "Lionel Andrés Messi Cuccittini" has both "Lionel" and "Messi" from d1's name and then merging on it)?
After searching for a while, it doesn't seem like something in pandas functionality since it splits it into several columns. I also don't know if merges can take conditions like substring matches. Everything I've thought of doesn't address these exceptions/non-matches except for substring matching.
Edit: As another user said, it might be worth splitting the names into sets and making sure all the substrings in "name" in df2 is found in "long_name" in df1. But I cannot seem to find a way to do that without splitting the string into separate columns.
SOLVED: What seems to work is isolating and creating a copy of the columns (making them series) then splitting the names into lists and seeing if the shorter names were subsets of the longer names in a double for-loop. Here is my code:
names = df1['name']
long_names = df2['long_name']
for i in range(len(names)):
    name_list = names[i].split()
    for j in range(len(long_names)):
        long_name_list = long_names[j].split()
        if set(name_list).issubset(long_name_list):
            df2.loc[j, "long_name"] = df1.loc[i, "name"]
