I have 2 dataframes. The first one (900 lines) contains corrections that have been applied to a deal. The second dataframe (140 000 lines) contains the list of deals with corrected values. What I am trying to do is to put the old value back.
To link the corrected deals to the corrections I have to compare a number of attributes. In the correction dataframe (900 lines) I have the old and the new value for each corrected attribute. But each correction can be corrected on a different attribute, therefore I check every possible corrected attribute (in the correction dataframe) to compare the new value with the old one and check if this attribute was corrected. If it was I put the old value back. I'm precise that a correction can apply on several deals that share the same data in the fields used to identify.
To finish, I create a new column on the Deals dataframe (140 000 lines) where I put a boolean that true when a deals has been uncorrected, false otherwise.
My code right now is quite gross, I wanted to factorize a bit but the iteration process blocked me. It is running but it has to go through 900*140 000 lines. I launched it on a quad core VM with 12Gb RAM and it went through through it in about 1h20min.
How can I improve performance? Is multithreading possible to use in this case ? Here is shorted version of my code, just imagine that the number of if statements is 10 times bigger.
def CreationUniqueid(dataframe,Correction):
#creating new column to mark the rows we un corrected
dataframe['Modified']=0
#getting the link between the corrections and deals
b=0
for index in Correction.index:
    b+=1 #just values to see progression of the program
    c=0
    for index1 in dataframe.index:
        c+=1
        a=0
        print('Handling correction '+str(b)+' and deal '+str(c)) # printing progress
        if (Correction.get_value(index,'BO Branch Code')==dataframe.get_value(index1,'Wings Branch') and Correction.get_value(index,'Profit Center')==dataframe.get_value(index1,'Profit Center'))
            print('level 1 success')
            if ((Correction.get_value(index,'BO Trade Id')==dataframe.get_value(index1,'Trade Id') and Correction.get_value(index,'BO Trade Id')!='#') or
                    (Correction.get_value(index,'Emetteur Trade Id')==dataframe.get_value(index1,'Emetteur Trade Id')=='#' and Correction.get_value(index,'BO Trade Id')==dataframe.get_value(index1,'Trade Id'))):
                print('identification success')
                # putting the dataframe to the old state, we need the data in the bad shape to make the computer learn what is a bad trade and what is normal
                if Correction.get_value(index,'Risk Category')!=Correction.get_value(index,'Risk Categgory _M') and Correction.get_value(index,'Risk Category _M')!='':
                    dataframe.set_value(index1,'Risk Category',Correction.get_value(index,'Risk Category'))
                    a=1
                    print('Corr 1 ')
                if Correction.get_value(index,'CEC Ricos')!=Correction.get_value(index,'CEC Ricos _M') and Correction.get_value(index,'CEC Ricos _M')!='':
                    dataframe.set_value(index1,'CEC Ricos',Correction.get_value(index,'CEC Ricos'))
                    a=1
                    print('Corr 2')
                if Correction.get_value(index,'Product Line')!= Correction.get_value(index,'Product Line _M') and Correction.get_value(index,'Product Line _M')!='':
                    dataframe.set_value(index1,'Product Line Code Ricos',Correction.get_value(index,'Product Line'))
                    a=1
                    print ('corr 3')
    return dataframe
