I have two dataframes, train and test. The test set has missing values on a column.
import numpy as np
import pandas as pd
train = [[0,1],[0,2],[0,3],[0,7],[0,7],[1,3],[1,5],[1,2],[1,2]]
test = [[0,0],[0,np.nan],[1,0],[1,np.nan]]
train = pd.DataFrame(train, columns = ['A','B'])
test = pd.DataFrame(test, columns = ['A','B'])
The test set has two missing values on column B. If the groupby column is A
- If the imputing strategy is
mode, then the missing values should be imputed with7and2. - If the imputing strategy is
mean, then the missing values should be(1+2+3+7+7)/5 = 4and(3+5+2+2)/4 = 3.
What is a good way to do this?
This question is related, but uses only one dataframe instead of two.