I have a dataframe that looks like this:
| index | period | category |
|---|---|---|
| 1 | 20181231 | 1 |
| 2 | 20181231 | 2 |
| 3 | 20181231 | 3 |
| 4 | 20190131 | 1 |
| 5 | 20190131 | 2 |
| 6 | 20190131 | 2 |
I want to get the following dataframe:
| index | period | category | category_count | period_count |
|---|---|---|---|---|
| 1 | 20181231 | 1 | 1 | 3 |
| 2 | 20181231 | 2 | 1 | 3 |
| 3 | 20181231 | 3 | 1 | 3 |
| 4 | 20190131 | 1 | 1 | 3 |
| 4 | 20190131 | 2 | 2 | 3 |
I tried to use various group by and aggregate logic but I always end up that period_count equals to category_count since the group by and aggregate will only aggregate through both groups (which are period and category in that case).
Is there a way to do a "nested" group by where one aggregation is done through both groups and the other is done through the first one?