I'm pretty new to Pandas and am kind of stucked with a problem to replace Nan-Values with median values from a different dataframe. The median dataframe has a different form, because I had to group the original df to get the medians.
My main dataframe df1 looks something like this:
      permno    yyyymm  BookLeverage Cash   RoE        ShareIss1Y   ShareIss5Y   SP         date        industry_id     STreversal  Price         Size      ret
541     10006   197101  -1.907577   NaN     0.114616    0.000000    0.051689    1.197606    1971-01-29  37              -4.383562   -3.863358   -12.496377  0.043836
542     10006   197102  -1.907577   NaN     0.114616    0.000000    0.051689    1.220021    1971-02-26  37              0.577428    -3.844814   -12.477833  -0.005774
543     10006   197103  -1.907577   NaN     0.114616    0.000000    0.051689    1.118353    1971-03-31  37              -9.090909   -3.931826   -12.564844  0.090909
544     10006   197104  -1.907577   NaN     0.114616    0.000000    0.051689    NaN         1971-04-30  37              -16.176471  -4.081766   -12.714785  0.161765
545     10006   197105  -1.907577   NaN     0.114616    0.000000    0.051689    1.025366    1971-05-28  37              5.105485    -4.018633   -12.651651  -0.051055  
Then I created a new dataframe df2 in which I grouped the former df by the yyyymm and industry_id column, and got the median for each time-industry panel.
The median df2 looks something like this:
                     permno  BookLeverage  Cash       RoE  ShareIss1Y  \
yyyymm industry_id                                                      
197101 01           40957.5     -2.451327   NaN  0.015212   -0.306936   
       10           19254.0     -1.300565   NaN  0.123353   -0.002747   
       12           33081.5     -2.102402   NaN -0.001043   -0.255756   
       13           26470.0     -2.028418   NaN  0.116907   -0.005262   
       14           17830.0     -1.266574   NaN  0.110059   -0.000193   
...                     ...           ...   ...       ...         ...   
202112 80           78633.0     -3.037694   NaN  0.195342         NaN   
       82           52123.0     -3.093551   NaN  0.017580         NaN   
       83           13739.0     -2.802522   NaN  0.021025         NaN   
       87           78667.5     -3.103168   NaN  0.104524         NaN   
       97           91547.0     -3.054443   NaN  0.162610         NaN   
                    ShareIss5Y        SP  STreversal     Price       Size  \
yyyymm industry_id                                                          
197101 01            -7.591944  5.439985   -9.998244 -2.684046 -11.483201   
       10            -1.432833  0.517484   -4.504504 -3.367296 -11.826440   
       12           -20.622667  2.264890  -22.648810 -2.873900 -11.501783   
       13            -0.257821  0.752112   -5.429864 -3.607534 -12.362360   
       14            -0.223948  0.636665  -16.075773 -2.729726 -11.386150   
...                        ...       ...         ...       ...        ...   
202112 80                  NaN       NaN  -10.960198 -4.539740 -16.024733   
       82                  NaN       NaN   -1.664319 -2.740474 -13.882130   
       83                  NaN       NaN   -2.383083 -4.835329 -15.843560   
       87                  NaN       NaN   -5.109321 -4.585741 -15.844537   
       97                  NaN       NaN   -1.535659 -4.487512 -16.339328   
                         ret  
yyyymm industry_id            
197101 01           0.099982  
       10           0.045045  
       12           0.226488  
       13           0.054299  
       14           0.160758  
...                      ...  
202112 80           0.109602  
       82           0.016643  
       83           0.023831  
       87           0.051093  
       97           0.015357
What I'm now trying to achieve, is to fill the NaN-values in the df1 with the corresponding value from df2. So that for example the SP column in row 544 would get the value which is in df2 at yyyymm 197104 with industry_id 37.
I tried to map over all rows and inside that over all columns and replace the NaN-values, but this broke my dataframe:
def fill_nan_with_median(row):
    date = int(row['yyyymm'])
    industry = row['industry_id']
    for label, column in row.items():
        if column == np.nan:
            median = df_median.loc[(date, industry), label]
            df_1.loc[index, label] = median
    
for index, row in df_1.iterrows():
    fill_nan_with_median(row)
 
     
    