Assuming that df is my forecast result and actual value, the period from September 2021 to February 2022 is the backtest part, and the period after March 2022 is the forecast part:
date pred actual
0 2021-9-30 14.88 27.70
1 2021-10-31 6.59 26.62
2 2021-11-30 5.88 21.49
3 2021-12-31 7.29 20.58
4 2022-1-31 9.79 24.00
5 2022-2-28 14.74 6.10
6 2022-3-31 9.47 NaN
7 2022-4-30 7.85 NaN
8 2022-5-31 4.81 NaN
9 2022-6-30 3.49 NaN
Now I want to correct pred column according to actual column of the backtest data, so that the predicted value in the last period of the backtest period is as close as possible to the actual value. How can I solve this problem? Thank you very much.
For example, we can subtract 8.64 from pred column (14.74-6.10=8.64), or by building a polynomial regression of pred and actual column.
One possible expected output:
last_valid_id = df['actual'].notna()[::-1].idxmax()
last_valid_row = df.loc[df.index == last_valid_id]
gap = last_valid_row['pred'] - last_valid_row['actual']
df['ajd_pred'] = df['pred'] - gap.values
df
Out:
date pred actual adj_pred
0 2021-9-30 14.88 27.70 6.242131
1 2021-10-31 6.59 26.62 -2.049860
2 2021-11-30 5.88 21.49 -2.756469
3 2021-12-31 7.29 20.58 -1.345215
4 2022-1-31 9.79 24.00 1.152847
5 2022-2-28 14.74 6.10 6.099557
6 2022-3-31 9.47 NaN 0.834391
7 2022-4-30 7.85 NaN -0.792580
8 2022-5-31 4.81 NaN -3.826918
9 2022-6-30 3.49 NaN -5.150675
Reference:
