I have a dataframe, as follows:
customer_id  order_ts order_nbr item_nbr
162038  04/04/23 18:42  1258    972395
162038  04/04/23 18:42  1258    551984
162038  04/04/23 18:42  1258    488298
162038  04/04/23 18:42  1258    649230
162038  26/02/23 16:28  2715    372225
162038  26/02/23 16:28  2715    911716
162038  26/02/23 16:28  2715    696677
162038  26/02/23 16:28  2715    229455
162038  26/02/23 16:28  2715    870016
162038  29/01/23 13:07  1171    113719
162038  29/01/23 13:07  1171    553461
162060  01/05/23 18:42  1259    300911
162060  01/05/23 18:42  1259    574962
162060  01/05/23 18:42  1259    843300
162060  01/05/23 18:42  1259    173719
162060  05/05/23 18:42  2719    254899
162060  05/05/23 18:42  2719    776553
162060  05/05/23 18:42  2719    244739
162060  05/05/23 18:42  2719    170742
162060  05/05/23 18:42  2719    525719
162060  10/05/23 18:42  1161    896919
162060  10/05/23 18:42  1161    759465
I am interested to create a column "row_num" as follows:
customer_id  order_ts order_nbr item_nbr    row_num
162038  04/04/23 18:42  1258    972395  1
162038  04/04/23 18:42  1258    551984  1
162038  04/04/23 18:42  1258    488298  1
162038  04/04/23 18:42  1258    649230  1
162038  26/02/23 16:28  2715    372225  2
162038  26/02/23 16:28  2715    911716  2
162038  26/02/23 16:28  2715    696677  2
162038  26/02/23 16:28  2715    229455  2
162038  26/02/23 16:28  2715    870016  2
162038  29/01/23 13:07  1171    113719  3
162038  29/01/23 13:07  1171    553461  3
162060  01/05/23 18:42  1259    300911  1
162060  01/05/23 18:42  1259    574962  1
162060  01/05/23 18:42  1259    843300  1
162060  01/05/23 18:42  1259    173719  1
162060  05/05/23 18:42  2719    254899  2
162060  05/05/23 18:42  2719    776553  2
162060  05/05/23 18:42  2719    244739  2
162060  05/05/23 18:42  2719    170742  2
162060  05/05/23 18:42  2719    525719  2
162060  10/05/23 18:42  1161    896919  3
162060  10/05/23 18:42  1161    759465  3
"row_num" column is basically unique for a particular customer_id, order_ts, order_nbr. For example, customer_id 162038, order_ts = 04/04/23 18:42, order_nbr = 1258, row_num = 1 for all the item numbers.
I did as follows:
windowDept = Window.partitionBy("customer_id","order_ts","order_nbr").orderBy(col("order_ts").desc())
df1 = df.withColumn("row_num",row_number().over(windowDept))
But I am getting this as output:
Any help will be highly appreciated. Thanks.

 
    