I have table similar to this one:
index   name_1     path1        name_2       path2
0       Roy       path/to/Roy     Anne      path/to/Anne
1       Anne      path/to/Anne     Roy      path/to/Roy 
2       Hari      path/to/Hari    Wili      path/to/Wili
3       Wili      path/to/Wili    Hari      path/to/Hari
4       Miko      path/to/miko     Lin      path/to/lin
5       Miko      path/to/miko     Dan      path/to/dan
6       Lin       path/to/lin     Miko      path/to/miko
7       Lin       path/to/lin     Dan       path/to/dan
8       Dan       path/to/dan     Miko      path/to/miko
9       Dan       path/to/dan     Lin       path/to/lin
...
As you can see, the table kind of showing relationship between entities -
Roi is with Anne,
Wili with Hari,
Lin with Dan and with Miko.
The table is actually showing overlap data , meaning, Hari and wili for example, have the same document, and I would like to remove one of them not to have duplicated files. In order to do this, I would like to create new table that has only one name in relationship, so I can later create list of paths to remove.
The result table will look like this :
index   name_1     path1        name_2       path2
0       Roy       path/to/Roy      Anne      path/to/Anne
1       Hari      path/to/Hari     Wili      path/to/Wili
2       Miko      path/to/miko     Lin       path/to/lin
3       Miko      path/to/miko     Dan       path/to/dan
The idea is that I'll use the values of "path2" to remove files with this path, and will still have the files in path1. for that reason, this line:
4       Lin       path/to/lin    Dan       path/to/dan
is missing, as it will be removed using Miko... any ideas how to do this ? :)
Edit:
I have tried this based on this answer:
df_2= df[~pd.DataFrame(np.sort(df.values,axis=1)).duplicated()]
And it's true that I get less rows in my dataframe (it has 695 and I got now 402) , but, I still have the first lines like this:
index   name_1     path1        name_2       path2
0       Roy       path/to/Roy     Anne      path/to/Anne
1       Anne      path/to/Anne     Roy      path/to/Roy 
...
meaning I still get the same issue