I am trying to ascertain if values in a test dataframe (df2) are not appearing in another DF (df1). The following are the two DFs:
df1 created from the following source:
| field1 | field2 |
|---|---|
| AG | Agree |
| SA | Somewhat Agree |
| DG | Disagree |
| SD | Somewhat Disagree |
| NO | None |
df2 created from the following source:
| field1 | field2 |
|---|---|
| CA | California |
| TX | Texas |
| NO | None |
| NY | New York |
Using Method 1 (see below), I am getting the expected result, which is:
Method 1
diff_df = df2[~(df2[field1].isin(df1[field1]) & df2[field2].isin(df1[field2]))].reset_index(drop=True)
This gives me the folllowing expected result:
field1 field2
0 CA California
1 TX Texas
2 NY New York
Note: The duplicate value in df2 (NO: None) gets dropped, too.
However, there is one problem that I am facing: There can be situations when there are different set of fields that will need to be compared (eg. there may be a third field field3 in the equation).
From case to case basis, the number of fields would vary greatly over which the user won't have control.
My problem: How do I modify my query so that by comparing the two dataframes I get the expected result?
In the situation as explained, what shuld be the possible approach?