I have a dataframe which I named parking which has multiple columns, in this case Registration State, Violation Code, and Summons Number.
For each Registration State, I want the 3 Violation Codes which the highest row count. The best I've been able to get is:
parking_state_group = parking.groupby(['Registration State', 'Violation Code'])['Summons Number'].count()
When printed (i.e. print(parking_state_group.reset_index()) looks like:
     Registration State  Violation Code  Summons Number
0                    99               0              14
1                    99               6               1
2                    99              10               6
3                    99              13               2
4                    99              14              75
...                 ...             ...             ...
1811                 WY              37               3
1812                 WY              38               4
1813                 WY              40               4
1814                 WY              46               1
1815                 WY              68               1
This at least gets me the count of each Violation Code for each state (Summons Number is like an ID field for each row). I want this to return only the 3 violation codes for each state with the highest count, so something like:
      Registration State  Violation Code  Summons Number
0                    99               14             75
1                    99               31             61
2                    99               87             55
...                 ...             ...             ...
1812                 WY              38               4
1813                 WY              40               4
1811                 WY              37               3
I've tried .nlargest() but this doesn't seem to get the largest .count(), only the largest values within a column, which isn't what I'm looking for.
 
    