I have the following dataframes:
df1 = pd.DataFrame({'col1': ['A','M','C'],
'col2': ['B','N','O'],
# plus many more
})
df2 = pd.DataFrame({'col3': ['A','A','A','B','B','B'],
'col4': ['M','P','Q','J','P','M'],
# plus many more
})
Which look like these:
df1:
col1 col2
A B
M N
C O
#...plus many more
df2:
col3 col4
A M
A P
A Q
B J
B P
B M
#...plus many more
The objective is to create a dataframe containing all elements in col4 for each col3 that occurs in one row in df1. For example, let's look at row 1 of df1. We see that A is in col1 and B is in col2. Then, we go to df2, and check what col4 is for df2[df2['col3'] == 'A'] and df2[df2['col3'] == 'B']. We get, for A: ['M','P','Q'], and for B, ['J','P','M']. The intersection of these is['M', 'P'], so what I want is something like this
col1 col2 col4
A B M
A B P
....(and so on for the other rows)
The naive way to go about this is to iterate over rows and then get the intersection, but I was wondering if it's possible to solve this via merging techniques or other faster methods. So far, I can't think of any way how.