I have the following df:
df = pd.DataFrame({'block': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'B': ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
'C': [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000],
})
Which outputs:
block B C
0 1 a 1000
1 1 b 2000
2 1 c 3000
3 2 a 4000
4 2 b 5000
5 2 c 6000
6 3 a 7000
7 3 b 8000
8 3 c 9000
I want to output a diff_column (could also be an entirely new_df) who's values are the difference between the values in the C column grouped by the B column for every different value in the block column.
That is, what's the difference between a's value in block 2 and a's value in block 1?
Example for a:
4000 - 1000 = 3000
Note, the rows for block == 1 would be empty since there is no other previous block.
I have tried:
df['diff_column'] = df.groupby(['block', 'B'])['C'] - df.shift[-1].groupby(['block', 'B'])['C']
with no success.