I want to use bfill on a pandas dataframe but I want the value to use for each backfill to be dependant on the values in the row.
Example input:
type val
2018-12-31 H 1
2019-03-31 NaN NaN
2019-06-30 Q 2
2019-07-31 NaN NaN
2019-08-31 H 3
2019-09-30 Y 4
2019-12-31 Q 5
Expected output:
type val
2018-12-31 H 1
2019-03-31 Q 2 <-- Same as 2019-06-30
2019-06-30 Q 2
2019-07-31 Q 6 <-- Double 2019-08-31
2019-08-31 H 3
2019-09-30 Y 4
2019-12-31 Q 5
In this example, the backfilled value for 2019-07-31 is 6 because it has a H type, i.e. it's double the (2019-08-31, H) value. On the other hand, the backfilled value for 2019-03-31 is the same as the next row since that type is Q.
Rules:
- Type
H: double the value for backfill - Type
QandY: keep the value for backfill - All types: Set type to
Q
I could not find any straightforward built in way of doing this. I need to do this on a very large dataframe so speed is important to me, and it's why I can't loop.