I got a transactional operation that produces a feed like below:
df = pd.DataFrame({'action':['transacted','transacted','transacted','transacted','undo','transacted','transacted','transacted','transacted','transacted','undo','undo','undo','transacted'],
'transaction_count':10,20,35,60,60,60,80,90,100,10,10,100,90,90]})
| action | transaction_count | |
|---|---|---|
| 0 | transacted | 10 |
| 1 | transacted | 20 |
| 2 | transacted | 35 |
| 3 | transacted | 60 |
| 4 | undo | 60 |
| 5 | transacted | 60 |
| 6 | transacted | 80 |
| 7 | transacted | 90 |
| 8 | transacted | 100 |
| 9 | transacted | 10 |
| 10 | undo | 10 |
| 11 | undo | 100 |
| 12 | undo | 90 |
| 13 | transacted | 90 |
The counts are in a pattern but not in a linear way. (10-20-35-60-80-90-100-10-20...)
undo states which transaction count is cancelled.
There can be multiple undo's for multiple cancellations.
# This is an initial apply, to set it up
df['is_undone']=df.apply(lambda x: 1 if x['action']=='undo' else 0, axis=1).shift(-1)
df=df.fillna(0) # For shift
df=df.loc[df['is_undone']==0]
df=df.fillna(0)
df=df.loc[df['action']!='undo']
df.reset_index(drop=True,inplace=True)
Unfortunately, it only works for single undo but not for multiple in a row. Apply does not let accessing neighbour row values and I can't think of any else solution. It should also need to calculate 300k rows, so, performance is also an issue.
Expected result is:
| action | transaction_count | |
|---|---|---|
| 0 | transacted | 10 |
| 1 | transacted | 20 |
| 2 | transacted | 35 |
| 3 | transacted | 60 |
| 4 | transacted | 80 |
| 5 | transacted | 90 |
Thanks in advance!