| patient_id | alert_id | alert_timestamp |
|---|---|---|
| 3 | xyz | 2022-10-10 |
| 1 | anp | 2022-10-12 |
| 1 | gfe | 2022-10-10 |
| 2 | fgy | 2022-10-02 |
| 2 | gpl | 2022-10-03 |
| 1 | gdf | 2022-10-13 |
| 2 | mkd | 2022-10-23 |
| 1 | liu | 2022-10-01 |
I have a data frame (see simplified version above) where for each patient_id, I want to only keep the latest alert (i.e. last one) that was sent out in a given window period e.g. window_size = 7.
Note, the window size needs to look at consecutive days i.e. between day 1 -> day 1 + window_size. The ranges of alert_timestamp for each patient_id varies and is usually well beyond the window_size range.
Note, that the data frame example given above, is a very simple example and will have many more patient_id's and will be in a mixed order in terms alert_timestamp and alert_id.
The approach is to start from the last alert_timstamp for a given patient_id and work back using the window_size to select the alert that was the last one in that window time frame.
Please note the idea is to have a scanning/looking window, example window_size = 7 days to move across the timestamps of each patient
The end result I want, is a data frame with the filtered out alerts
Expected output for (this example) window_size = 7:
| patient_id | alert_id | alert_timestamp |
|---|---|---|
| 1 | liu | 2022-10-01 |
| 1 | gdf | 2022-10-13 |
| 2 | gpl | 2022-10-03 |
| 2 | mkd | 2022-10-23 |
| 3 | xyz | 2022-10-10 |
What's the most efficient way to solve for this?