I am working on a dataframe with some values inside. The problem is, I might have duplicates.
I went on this link but i couldn't find what I needed
What I tried is to create a list of duplicates using df.duplicated() which gives me True and False values for each indexes.
 Then for each index in this list where the result is True, I get the id from the df using df.loc[(df['id']== df['id'][dups]) ]. Depending on this result I call a function giveID() which returns a list of indexes to delete from the duplicates list. Because i don't need to iterate on the duplicates that are supposed to be deleted, is it possible to delete these indexes from the duplicates list during the for loop without breaking everything ?
Here is an example of my df (the duplicates are based on id column) :
   | id | type
--------------
0  | 312| data2
1  | 334| data
2  | 22 | data1
3  | 312| data8
#Here 0 and 3 are duplicates based on ID
Here is a part of my code:
duplicates = df.duplicated(subset='column_name',keep=False)
duplicates = duplicates[duplicates]
df_dup = df
listidx = []
i=0
for dups in duplicates.index:
    dup_id = df.loc[(df['id']== df['id'][dups])]
    for a in giveID(dup_id):
        if a not in listid:
            listidx.append(a)
#here i want to delete the all listidx from duplicates inside the for loop
#so that I don't iterate over unnecessary duplicates
def giveID(id)
#some code that returns a list of indexes
This is how looks duplicates in my code:
0          True
1          True
582        True
583        True
605        True
606        True
622        True
623        True
624        True
625        True
626        True
627        True
628        True
629        True
630        True
631        True
           ... 
1990368    True
1991030    True
And i would like get the same but without unnecessary duplicates
 
    