I have a pandas data frame like so.
| fruit | year | price | 
|---|---|---|
| apple | 2018 | 4 | 
| apple | 2019 | 3 | 
| apple | 2020 | 5 | 
| plum | 2019 | 3 | 
| plum | 2020 | 2 | 
and I want to add column [last_year_price]
please help......
I have a pandas data frame like so.
| fruit | year | price | 
|---|---|---|
| apple | 2018 | 4 | 
| apple | 2019 | 3 | 
| apple | 2020 | 5 | 
| plum | 2019 | 3 | 
| plum | 2020 | 2 | 
and I want to add column [last_year_price]
please help......
For this, you can use groupby and shift:
df['last_year_price'] = df.groupby('fruit').shift(1).price
You can use the shift function:
df['last_year_price'] = df.sort_values(by=['year'], ascending=True).groupby(['fruit'])['price'].shift(1)
Use DataFrameGroupBy.idxmax for rows with maximal years and join to oriinal DataFrame:
df = df.merge(df.loc[df.groupby('fruit')['year'].idxmax(), ['fruit','price']].rename(columns={'price':'last_year_price'}), on='fruit', how='left')
print (df)
   fruit  year  price  last_year_price
0  apple  2018      4                5
1  apple  2019      3                5
2  apple  2020      5                5
3   plum  2019      3                2
4   plum  2020      2                2