I have a dataframe such as
COL1 COL2
Homo_sapiens Mus_musculus
Mus_musculus Homo_sapiens
Droso_A Droso_b
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus
How can I remove duplicated values within COL1 and COL2 not matter where the values are, wich mean that I want to remove duplicate couple values. Here is an example:
For instance, Homo_sapiens is present in COL1 AND Mus_musculus is in COL2
But since Homo_sapiens is present in COL2 AND Mus_musculus is in COL1 as well,
I only keep the first one :
COL1 COL2
Homo_sapiens Mus_musculus
Droso_A Droso_b
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus
Then for Droso_A and Droso_b it is a classic duplicate that can be achieved using :
df = df.drop_duplicates(subset = ["COL1","COL2"])
COL1 COL2
Homo_sapiens Mus_musculus
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus
Then Betta_spe and Rattus_rattus and Rattus_norvegicus does not have any duplicate :
COL1 COL2
Homo_sapiens Mus_musculus
Droso_A Droso_b
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus