I have a dynamic dataset like below which is updating everyday. Like on Jan 11 data is:
| Name | Id |
|---|---|
| John | 35 |
| Marrie | 27 |
On Jan 12, data is
| Name | Id |
|---|---|
| John | 35 |
| Marrie | 27 |
| MARTIN | 42 |
I need to take count of the records and then append that to a separate dataset. Like on Jan 11 my o/p dataset is
| Count | Date |
|---|---|
| 2 | 11-01-2023 |
On Jan 12 my o/p dataset should be
| Count | Date |
|---|---|
| 2 | 11-01-2023 |
| 3 | 12-01-2023 |
and so on for all other days whenever the code is ran.
This has to be done using Pyspark
I tried using the semantic_version in the incremental function but it is not giving the desired result.