I am trying to return a df where duplicate values have been removed. I have tried to use drop.duplicates() but the values in the columns which have been subset aren't ordered. As in, the values are duplicates but they aren't in the same order.
For instance, using the df below, if I try to remove duplicate values from Item_X and Item_Y it will return the same df. Where the intended output will remove the second row.
import pandas as pd
d = ({
'Item_X' : ['Foo','Bar','Bot','Bot','Bar','Foo'],
'Item_Y' : ['Bar','Foo','Foo','Bot','Bar','Foo'],
'Value' : [1,2,3,4,5,6],
})
df = pd.DataFrame(data = d)
df.drop_duplicates(subset=['Item_X','Item_Y'])
Expected Result:
Item_X Item_Y Value
0 Foo Bar 1
2 Bot Foo 3
3 Bot Bot 4
4 Bar Bar 5
5 Foo Foo 6
Actual Output (Incorrect):
Item_X Item_Y Value
0 Foo Bar 1
1 Bar Foo 2
2 Bot Foo 3
3 Bot Bot 4
4 Bar Bar 5
5 Foo Foo 6
What would be the most efficient way to tackle this problem?