I have a dataframe with 2 columns: UserProductCombo, OrderDates. I have multiple order dates for each user/product group (1 to 5 dates per group).
I have sorted the data in descending order to get the top most order date for each group.
I would like to compute differences between the order dates for each group and put these in a new column in my dataframe (IN DAYS).
(i.e OrderDate1-OrderDate2, OrderDate1-OrderDate3, OrderDate1-OrderDate4, OrderDate1-OrderDate5) If not more than 2 orders exist, i want the it to move to the next group.
Sample data:
>>> bf_recency
        UserProduct               OrderDates
0   12111211/123232  2020-03-12 17:19:16.103
1   12111211/123232  2020-03-12 18:10:45.974
2   12111211/123232  2020-03-11 17:19:16.103
3   12111211/123232  2020-03-10 18:10:45.974
4   12111211/123232  2020-03-10 18:10:45.974
5   165870101/73066  2020-03-12 19:49:15.752
Expected Output:
        UserProduct               diff(in days)
0   12111211/123232               N/A
1   12111211/123232               0
2   12111211/123232               1
3   12111211/123232               2
4   12111211/123232               2
5   165870101/73066               N/A
So far I have this:
df_frequency =  df.groupby(["UserProduct"])['ORDER_DATE'].nlargest(5).reset_index(name ='OrderDates') 
df_frequency.sort_values(by=['OrderDates'],inplace=True, ascending=False)
df_freq = df_frequency.groupby(['UserProduct'])['OrderDates'].transform(lambda x: x.diff())  #STUCK HERE
 
    