A similar question was asked here Pandas merge on multiple columns ignoring NaN but without answer, so I'll ask maybe someone can help.
I need to merge values from df2 into df1, but the key used in the merge differs between rows in df2, as the rows in df2 have NaNs in different columns, and in that case I want to ignore those columns, and use for each row only the columns that have values.
df1 = pd.DataFrame([[0, 1, 3], [0, 2, 4], [1, 2, 5]], columns=['level1', 'level2', 'level3'])
df1
level1 level2 level3
0 0 1 3
1 0 2 4
2 1 2 5
df2 = pd.DataFrame([[0, None, None, 10], [0, 1, None, 12], [None, 2, 5, 13]], columns=['level1', 'level2', 'level3', 'value'])
df2
level1 level2 level3 value
0 0.0 NaN NaN 10
1 0.0 1.0 NaN 12
2 NaN 2.0 5.0 13
When I do df1.merge(df2, how='left'), I get df1 with NaN in the value column, since there is no match on all the level columns as pandas is trying to match the Nan values as well.
What I do want is to get a match for any rows in df2 without trying to match the NaNs:
level1 level2 level3 value
0 0 1 3 10
1 0 1 3 12
2 0 2 4 10
3 1 2 5 13
Explanation:
Row 0 in df1 has a match on the non-NaN columns of rows 0 and 1 in df2, so it gets values 10 and 12 from there. Row 1 in df1 has a match on the non-NaN columns of row 0 in df2, so it gets value 12 from there. Row 2 in df1 has a match on the non-NaN columns of row 2 in df2, so it gets value 13 from there.
In the real data I actually have 6 level columns and the non-NaN columns for each row in df2 can be any combination or a single column from there.
What I do now is to iterrows the rows in df2, create for each one a mini-dataframe of only the non-NaN columns, and merge df1 with it. But as we know, it's not really efficient, and I wonder it there something better that can be done.