Let's say today is 20221031. Data is like follows:
| create_time | id | type | amount |
|---|---|---|---|
| 20221031 | 1 | A | 10 |
| 20221030 | 1 | A | 10 |
| 20221031 | 2 | A | 10 |
| 20221030 | 2 | B | 10 |
| 20220928 | 1 | A | 10 |
What I want:
| create_time | id | type | amount | sum_amount_within_one_month | sum_amount_within_three_months |
|---|---|---|---|---|---|
| 20221031 | 1 | A | 10 | 20(there are two records where id=1 and type=A within one month) | 30(there are three records where id=1 and type=A within 3 months) |
| 20221030 | 1 | A | 10 | 20 | 30 |
| 20221031 | 2 | A | 10 | 10 | 10 |
| 20221030 | 2 | B | 10 | 10 | 10 |
| 20220928 | 1 | A | 10 | 20 | 30 |
What I have tried:
df.withColumn("sum_amount_within_one_month", F.sum("amount").over(Window.partitionBy("id","type").where(df.create_date > "20220930")))
This fails with an error: WindowSpec object has no attribute 'where'