I am working through the titanic kaggle problem and one of the things I am looking to do is use sklearn's IterativeImputer to fill in my missing values.
I am hitting a roadblock after I run the imputation and generate my "filled" values. I am wondering how best to update the original dataframe with the filled values.
Code:
from sklearn.experimental import enable_iterative_imputer  
from sklearn.impute import IterativeImputer
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np
titanic = pd.DataFrame(
    {
     "PassengerId": [1, 2, 3, 4, 5],
     "Survived": [0, 1, 1, 1, 0],
     "PClass": ['3', '1', '3', '1', '3'],
     "Name": ['Braund, Mr. Owen Harris', 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
              'Heikkinen, Miss. Laina', 'Futrelle, Mrs. Jacques Heath (Lily May Peel)', 'Allen, Mr. William Henry'],
     "Sex": ['male', 'female', 'female', 'female', 'male'],
     "Age": [22, 38, 26, np.nan, 35],
     "SibSp": [1, 1, 0, 1, 0],
     "Parch": [0, 0, 0, 0, 0],
     "Fare": [7.25, 71.2833, 7.925, 53.1, 8.05]
     }
    )
# Slicing dataframe to feed to imputer
titanic_sliced = titanic.loc[:, ['Age', 'SibSp', 'Parch', 'Fare']]
titanic_sliced.head()
Output of sliced dataset:
        Age  SibSp  Parch     Fare
0  22.0      1      0   7.2500
1  38.0      1      0  71.2833
2  26.0      0      0   7.9250
3   NaN      1      0  53.1000
4  35.0      0      0   8.0500
Run imputer with a Random Forest estimator
imp = IterativeImputer(RandomForestRegressor(), max_iter=10, random_state=0)
imputed_titanic = pd.DataFrame(imp.fit_transform(titanic_sliced), columns=titanic_sliced.columns)
imputed_titanic
Output of imputed_titanic:
       Age  SibSp  Parch     Fare
0  22.00    1.0    0.0   7.2500
1  38.00    1.0    0.0  71.2833
2  26.00    0.0    0.0   7.9250
3  36.11    1.0    0.0  53.1000
4  35.00    0.0    0.0   8.0500
So now my question is, what is the best way to update the original dataframe with the imputed values?
 
    