I am trying to impute missing value NA with interpolation by multiple groups.
I just subset a simple example:
Year ST CC ID MP PS
2002 15 3 3 NA 1.5
2003 15 3 3 NA 1.5
2004 15 3 3 193 1.5
2005 15 3 3 193 1.5
2006 15 3 3 348 1.5
2007 15 3 3 388 1.5
2008 15 3 3 388 1.5
1999 53 33 1 NA 3.4
2000 53 33 1 NA 3.4
2002 53 33 1 NA 2.9
2003 53 33 1 NA 2.6
2004 53 33 1 NA 2.6
2005 53 33 1 170 3.8
2006 53 33 1 170 3.0
2007 53 33 1 330 4.2
2008 53 33 1 330 5.0
I used na.approx() but got wrong. It seems that my data has missing value in the first observation within each group.
setDT(dt)[, MP_interpolate := na.approx(MP, na.rm = T), .(Year, ST, CC, ID)]
setDT(dt)[, MP_interpolate := if(length(na.omit(MP))<2) MP else na.approx(MP, na.rm=TRUE), .(Year, ST, CC, ID)]
I also tried package imputeTS but not work.
Both are incorrect. Does it mean using interpolation to impute missing is not a good idea?
I'm not sure which imputation way is better (Do not want to use mean or median). I'm thinking about using PS trend to impute MP. (Just my thought, not question)