I have a dataframe with duplicate entries coming from two sources, all the values should be unique, but one column was not formatted the same, hence I should remove duplicate with different names in one column, but only if the names are within a list.
Technically, I would like to remove a row in a pandas dataframe if there exist another row with the same A and B values, but only if this row’s Z value is 'bar' and the other’s 'Z' is 'foo'.
An example might be clearer:
I have the given dataframe df
 A     B     Z
'a'   'a'   'foo'
'a'   'a'   'bar'
'b'   'a'   'bar'
'c'   'c'   'foo'
'd'   'd'   'blb'
And I would like to get
 A     B     Z
'a'   'a'   'foo'
'b'   'a'   'bar'
'c'   'c'   'foo'
'd'   'd'   'blb'
Note that:
- The rows with other values than 'foo'and'bar'in theZcolumn should not be touched.
- It’s not important if 'foo'and'bar'stay the same because they will get changed to the same value afterwards.
- It would be great to generalize the duo 'foo'and'bar'as a list.
Attempts so far: Here is my best guess, it doesn’t work though… I don’t understand much what groupby returns. Also I’m sure there is some magical pandas one-liner I just can’t find.
new_df = []
for row in df.groupby('A'):
    if rowloc['Z'].isin('foo'):
         if not row['Z'].isin('bar'):        
            new_df.append(row)
Thanks !
 
     
    