Situation
I have dataframe similar to below ( although I've removed many of the rows for this example, as evidenced in the 'index' column):
df
| index | id | name | last_updated |
|---|---|---|---|
| 0 | 1518 | Maker | 2022-12-31T03:02:00.000Z |
| 1 | 1518 | Maker | 2022-12-31T02:02:00.000Z |
| 2 | 1518 | Maker | 2022-12-31T14:02:00.000Z |
| 3 | 1518 | Maker | 2022-12-31T16:02:00.000Z |
| 23 | 1518 | Maker | 2022-12-31T17:02:00.000Z |
| 24 | 2280 | Filecoin | 2022-12-31T01:02:00.000Z |
| 25 | 2280 | Filecoin | 2022-12-31T03:01:00.000Z |
| 26 | 2280 | Filecoin | 2022-12-31T02:01:00.000Z |
| 27 | 2280 | Filecoin | 2022-12-31T00:02:00.000Z |
| 47 | 2280 | Filecoin | 2022-12-31T08:02:00.000Z |
| 48 | 4558 | Flow | 2022-12-31T01:02:00.000Z |
| 49 | 4558 | Flow | 2022-12-31T02:01:00.000Z |
| 71 | 4558 | Flow | 2022-12-31T05:02:00.000Z |
| 72 | 5026 | Orchid | 2022-12-31T01:02:00.000Z |
| 73 | 5026 | Orchid | 2022-12-31T03:02:00.000Z |
| 74 | 5026 | Orchid | 2022-12-31T02:01:00.000Z |
| 75 | 5026 | Orchid | 2022-12-31T00:02:00.000Z |
I want a version of the above dataframe but with only 1 row for each id parameter. Keeping the last instance.
This is my code:
df.drop_duplicates(subset=['id'], keep='last')
Expectation
That the new df would retain only 4 rows, the 'last' instance for each 'id' value in dataframe df.
Result
After running the drop_duplicates command, the df returns the exact same dataframe. Same shape as prior to my drop_duplicates attempt.
I've been trying to use this post to sort it out, but obvs there's something I'm not getting right:
pandas select rows with no duplicate
I'd appreciate any input on why the last instance of rows with duplicate 'id' values are not being dropped.