I am having performance issues with iterrows in on my dataframe as I start to scale up my data analysis.
Here is the current loop that I am using.
for ii, i in a.iterrows():
    for ij, j in a.iterrows():
        if ii != ij:
            if i['DOCNO'][-5:] == j['DOCNO'][4:9]:
                if i['RSLTN1'] > j['RSLTN1']:
                    dl.append(ij)
                else:
                    dl.append(ii)
            elif i['DOCNO'][-5:] == j['DOCNO'][-5:]:
                if i['RSLTN1'] > j['RSLTN1']:
                    dl.append(ij)
                else:
                    dl.append(ii)
c = a.drop(a.index[dl])
The point of the loop is to find 'DOCNO' values that are different in the dataframe but are known to be equivalent denoted by the 5 characters that are equivalent but spaced differently in the string. When found I want to drop the smaller number from the associated 'RSLTN1' column. Additionally, my data set may have multiple entries for a unique 'DOCNO' that I want to drop the lower number 'RSLTN1' result.
I was successful running this will small quantities of data (~1000 rows) but as I scale up 10x I am running into performance issues. Any suggestions?
Sample from dataset
In [107]:a[['DOCNO','RSLTN1']].sample(n=5)
Out[107]: 
           DOCNO   RSLTN1
6815  MP00064958  72386.0
218   MP0059189A  65492.0
8262  MP00066187  96497.0
2999  MP00061663  43677.0
4913  MP00063387  42465.0
 
    