I've a df as under
+-----+------+--------+--------------------+------+---------+
| ID1 | ID2  | DOC_NO |        DATE        | COST | CLIENT  |
+-----+------+--------+--------------------+------+---------+
| ABC | A123 |      1 | 2021-01-01 0:10:00 |   11 | ABC123  |
| DEF | B456 |      2 | 2021-01-01 0:10:00 |   12 | DEF256  |
| GHI | C789 |      3 | 2021-01-01 0:10:00 |   13 | GHI389  |
| JKL | D890 |      4 | 2021-01-01 0:10:00 |   14 | JKL490  |
| MNO | E012 |      5 | 2021-01-01 0:10:00 |   15 | MNO512  |
| ABC | A123 |      6 | 2021-01-01 0:15:00 |   11 | ABC623  |
| DEF | B456 |      7 | 2021-01-01 0:15:00 |   12 | DEF756  |
| GHI | C789 |      8 | 2021-01-01 0:15:00 |   13 | GHI889  |
| JKL | D890 |      9 | 2021-01-02 0:15:00 |   14 | JKL990  |
| MNO | E012 |     10 | 2021-01-03 0:15:00 |   15 | MNO1012 |
| ABC | A123 |     11 | 2021-01-03 0:20:00 |   10 | GHI890  |
| DEF | B456 |     12 | 2021-01-03 0:20:00 |   11 | JKL991  |
| GHI | C789 |     13 | 2021-01-03 0:20:00 |   12 | MNO1013 |
| JKL | D890 |     14 | 2021-01-03 0:20:00 |   13 | GHI891  |
| MNO | E012 |     15 | 2021-01-03 0:20:00 |   14 | JKL992  |
| ABC | A123 |     16 | 2021-01-03 0:20:00 |   12 | MNO1014 |
| DEF | B456 |     17 | 2021-01-03 0:20:00 |   13 | GHI892  |
| GHI | C789 |     18 | 2021-01-03 0:20:00 |   14 | JKL993  |
| JKL | D890 |     19 | 2021-01-03 0:20:00 |   15 | MNO1015 |
| MNO | E012 |     20 | 2021-01-03 0:20:00 |   16 | GHI893  |
| ABC | A123 |     21 | 2021-01-03 0:25:00 |   11 | ABC124  |
| DEF | B456 |     22 | 2021-01-03 0:25:00 |   12 | DEF257  |
| GHI | C789 |     23 | 2021-01-03 0:25:00 |   13 | GHI390  |
| JKL | D890 |     24 | 2021-01-03 0:25:00 |   14 | JKL491  |
| MNO | E012 |     25 | 2021-01-03 0:25:00 |   15 | MNO513  |
+-----+------+--------+--------------------+------+---------+
I want to group ID1 and ID2 and arrange the df by DOC_NO and DATE Post that I want to create a new column REFERENCE_COST, where the REFERENCE_COST is the highest cost with respect to time and DOC_NO arrangement, meaning if the COST increase with TIME and DOC_NO, the higher COST would now be set as a REFERENCE_COST So the new df would look as under:
+-----+------+--------+--------------------+------+---------+----------+
| ID1 | ID2  | DOC_NO |        DATE        | COST | CLIENT  | REF_COST |
+-----+------+--------+--------------------+------+---------+----------+
| ABC | A123 |      1 | 2021-01-01 0:10:00 |   11 | ABC123  |       11 |
| DEF | B456 |      2 | 2021-01-01 0:10:00 |   12 | DEF256  |       12 |
| GHI | C789 |      3 | 2021-01-01 0:10:00 |   13 | GHI389  |       13 |
| JKL | D890 |      4 | 2021-01-01 0:10:00 |   14 | JKL490  |       14 |
| MNO | E012 |      5 | 2021-01-01 0:10:00 |   15 | MNO512  |       15 |
| ABC | A123 |      6 | 2021-01-01 0:15:00 |   11 | ABC623  |       11 |
| DEF | B456 |      7 | 2021-01-01 0:15:00 |   12 | DEF756  |       12 |
| GHI | C789 |      8 | 2021-01-01 0:15:00 |   13 | GHI889  |       13 |
| JKL | D890 |      9 | 2021-01-02 0:15:00 |   14 | JKL990  |       14 |
| MNO | E012 |     10 | 2021-01-03 0:15:00 |   15 | MNO1012 |       15 |
| ABC | A123 |     11 | 2021-01-03 0:20:00 |   10 | GHI890  |       11 |
| DEF | B456 |     12 | 2021-01-03 0:20:00 |   11 | JKL991  |       12 |
| GHI | C789 |     13 | 2021-01-03 0:20:00 |   12 | MNO1013 |       13 |
| JKL | D890 |     14 | 2021-01-03 0:20:00 |   13 | GHI891  |       14 |
| MNO | E012 |     15 | 2021-01-03 0:20:00 |   14 | JKL992  |       15 |
| ABC | A123 |     16 | 2021-01-03 0:20:00 |   12 | MNO1014 |       12 |
| DEF | B456 |     17 | 2021-01-03 0:20:00 |   13 | GHI892  |       13 |
| GHI | C789 |     18 | 2021-01-03 0:20:00 |   14 | JKL993  |       14 |
| JKL | D890 |     19 | 2021-01-03 0:20:00 |   15 | MNO1015 |       15 |
| MNO | E012 |     20 | 2021-01-03 0:20:00 |   16 | GHI893  |       16 |
| ABC | A123 |     21 | 2021-01-03 0:25:00 |   11 | ABC124  |       12 |
| DEF | B456 |     22 | 2021-01-03 0:25:00 |   12 | DEF257  |       13 |
| GHI | C789 |     23 | 2021-01-03 0:25:00 |   13 | GHI390  |       14 |
| JKL | D890 |     24 | 2021-01-03 0:25:00 |   14 | JKL491  |       15 |
| MNO | E012 |     25 | 2021-01-03 0:25:00 |   15 | MNO513  |       16 |
+-----+------+--------+--------------------+------+---------+----------+
No, I want to be able to compare the REFERENCE_COST with the COST and filter all rows where the COST was less than the REFERENCE_COST and also add two new columns DATE_LAST_REF_COST_MET & CLIENT_LAST_REF_COST_MET which shows the DATE of the REFERENCE_COST and the CLIENT number from that REFERENCE_COST So the resulting df would be as under:
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
| ID1 | ID2  | DOC_NO |        DATE        | COST | CLIENT  | REF_COST | DATE_LAST_REF_COST_MET | CLIENT_LAST_REF_COST_MET |
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
| ABC | A123 |     11 | 2021-01-03 0:20:00 |   10 | GHI890  |       11 | 2021-01-01 0:15:00     | ABC623                   |
| DEF | B456 |     12 | 2021-01-03 0:20:00 |   11 | JKL991  |       12 | 2021-01-01 0:15:00     | DEF756                   |
| GHI | C789 |     13 | 2021-01-03 0:20:00 |   12 | MNO1013 |       13 | 2021-01-01 0:15:00     | GHI889                   |
| JKL | D890 |     14 | 2021-01-03 0:20:00 |   13 | GHI891  |       14 | 2021-01-02 0:15:00     | JKL990                   |
| MNO | E012 |     15 | 2021-01-03 0:20:00 |   14 | JKL992  |       15 | 2021-01-03 0:15:00     | MNO1012                  |
| ABC | A123 |     21 | 2021-01-03 0:25:00 |   11 | ABC124  |       12 | 2021-01-03 0:20:00     | MNO1014                  |
| DEF | B456 |     22 | 2021-01-03 0:25:00 |   12 | DEF257  |       13 | 2021-01-03 0:20:00     | GHI892                   |
| GHI | C789 |     23 | 2021-01-03 0:25:00 |   13 | GHI390  |       14 | 2021-01-03 0:20:00     | JKL993                   |
| JKL | D890 |     24 | 2021-01-03 0:25:00 |   14 | JKL491  |       15 | 2021-01-03 0:20:00     | MNO1015                  |
| MNO | E012 |     25 | 2021-01-03 0:25:00 |   15 | MNO513  |       16 | 2021-01-03 0:20:00     | GHI893                   |
+-----+------+--------+--------------------+------+---------+----------+------------------------+--------------------------+
This is what I was able to do :
df %>%
  group_by(ID1, ID2) %>%
  arrange(DATE, DOC_NO, .by_group = TRUE) %>%
  mutate(diff = COST - lag(COST, default = first(COST)))%>%
  mutate(REF_COST = case_when(diff < 0~lag(COST), TRUE~diff)) %>%
  mutate(DATE_LAST_REF_COST_MET= case_when(diff < 0~lag(DATE), TRUE~DATE)) %>%
  mutate(CLIENT_LAST_REF_COST_MET= case_when(diff < 0~lag(CLIENT), TRUE~CLIENT)) 
The limitation with this is that it doesnt change the REFERENCE_COST with DATE and DOC_NO while making the calculations
I'm not sure how do I achieve this
 
    