I have a df, and want to remove all duplicates on ID.
Name Symbol ID
0 ZOO INC Remove 88579Y101
1 Zoo Inc ZZZ 88579Y101
2 A Inc AAA 90138A103
3 a inc. Remove 90138A103
4 2U Inc TWUO 90214J101
5 Keep Remove 111111111
But I only want to remove the duplicate rows where Symbol == 'Remove'. The output should look like:
Name Symbol ID
0 Zoo Inc ZZZ 88579Y101
1 A Inc AAA 90138A103
2 2U Inc TWUO 90214J101
3 Keep Remove 111111111
I can't use result_df = df.drop_duplicates(subset=['ID'], keep='first') (or keep='last') because the dataset doesn't have a specific pattern. And sorting alphabetically first won't help either.
And while I know I can replace all Remove with NaN, and then use the solution provided here, I am looking for an alternate solution because I may eventually need to pass a list of strings.
Does Pandas support anything like: result_df = df.drop_duplicates(subset=['ID'], keep=(df['Symbol'] != 'Remove'))?