I am trying to look for a method to find a Player's mean score for the "Last Season" (Previous Year) and add it in a new column in the original dataframe df.
I have coded a formula to get a Player's mean score for the current year, excluding the current row, which is as follows:
df['Season Avg'] = df.groupby([df['Player'], df['DateTime'].dt.year])['Score']
.apply(lambda x: x.shift(1).expanding().mean())
However, despite my best attempt at using the shift function, I can not quite work out how to calculate the previous years mean ("Last Season Avg") directly into a new column.
The dataframe is set out as follows:
| Player | DateTime | Score | Season Avg |
|---|---|---|---|
| PlayerB | 2020-MM-DD HH:MM:SS | 40 | NaN |
| PlayerA | 2020-MM-DD HH:MM:SS | 50 | NaN |
| PlayerA | 2021-MM-DD HH:MM:SS | 100 | NaN |
| PlayerB | 2021-MM-DD HH:MM:SS | 200 | NaN |
| PlayerA | 2021-MM-DD HH:MM:SS | 160 | 100 |
| PlayerB | 2021-MM-DD HH:MM:SS | 140 | 200 |
| PlayerB | 2021-MM-DD HH:MM:SS | 160 | 170 |
| PlayerA | 2021-MM-DD HH:MM:SS | 200 | 130 |
The new ideal dataframe that I would like:
| Player | DateTime | Score | Season Avg | Last Season Avg |
|---|---|---|---|---|
| PlayerB | 2020-MM-DD HH:MM:SS | 40 | NaN | NaN |
| PlayerA | 2020-MM-DD HH:MM:SS | 50 | NaN | NaN |
| PlayerA | 2021-MM-DD HH:MM:SS | 100 | NaN | 50 |
| PlayerB | 2021-MM-DD HH:MM:SS | 200 | NaN | 40 |
| PlayerA | 2021-MM-DD HH:MM:SS | 160 | 100 | 50 |
| PlayerB | 2021-MM-DD HH:MM:SS | 140 | 200 | 40 |
| PlayerB | 2021-MM-DD HH:MM:SS | 160 | 170 | 40 |
| PlayerA | 2021-MM-DD HH:MM:SS | 200 | 130 | 50 |