I have a csv file that has 100 million rows and using a pc with 14GB of RAM. I have cut it into two parts of 50 million rows each. I have been waiting for two days just for the script to execute this code:
df['Column1']=df['Column1'].apply('{:0>7}'.format)
for index in df.index:
    if df.loc[index, 'Column2'] ==0.0 and df.loc[index,'Column3']==0:
        df.loc[index,'Column4'] = df.loc[index,'Column1'][:6]
    else:
        'F'
If there was a method to simplify that code, would that change the time to execute that code?
.   Column1    Column2   Column 3 Column4
0   5487964     1.0       2.0       F
1   5587694     0.0         0     558769
2   7934852     1.0         0       F
3   5487964     0.0       2.0       F
4   1111111     0.0         0     111111
5   5487964     1.0       2.0       F
 
     
     
     
    