Scenario. Assume a
pd.DataFrame, loaded from an external source- where one row is a line from a sensor. The index is a
DateTimeIndex - with some rows having
df.index.duplicated()==True. This actually means, there are lines with the same timestamp from different sensors.
Now applying some logic, like df.loc[df.A>0, 'my_col'] = 1, I ran into ValueError: cannot reindex from a duplicate axis. This can be solved by simply removing the duplicated rows using
df[~df.index.duplicated()]
But I wonder, if it would be possible, to actually apply a column based function during the Index de-duplication process? E.g.: Calculating the mean/max/min of column A/B/C for the duplicated rows.
Is this possible? Its something like a groupby.aggregate on df.index.duplicated() rows.