I have a dataset that shows the number of meals in each day. Column Week shows which week of the month that date belongs to. Please see below an example of the datase:
Id.        date.               Meals   Week
 1        2020-02-23            1         4
 1        2020-02-24            1         5
 1        2020-02-25            2         5
 1        2020-02-27            1         5
 1        2020-01-03            2         1
...         ...                ...       ...
 2        2020-03-04            3         2
 2        2020-03-05            4         2
 2        2020-03-06            3         2
 2        2020-03-07            1         2
 2        2020-03-08            2         2
I am grouping the data based on Participant ID and Week in order to get the average number of meals for each participant in each week. Plz see below:
d = data[['Id','Week','Meals']].groupby(['Id', 'Week'],sort=False ).agg('mean')
                                 Meals
               ID          Week
                1           4   1.400000
                            5   1.333333
                            1   2.000000
                            2   1.250000
                            3   1.000000
                2           2   2.000000
                            3   2.142857
                            4   2.500000
                            5   2.500000
                3           2   2.555556
                            3   2.600000
                            4   1.833333
                            5   2.000000
                            1   2.000000
My first question:
- if max number of meals for each participant was in the first or last week.
After Richie answer, the output of:
print(df.head(50).to_dict('split')
Is
{'index': [('"55eb3fea748092000daa9b25"', 4), ('"55eb3fea748092000daa9b25"', 5), ('"55eb3fea748092000daa9b25"', 1), ('"55eb3fea748092000daa9b25"', 2), ('"55eb3fea748092000daa9b25"', 3)], 'columns': ['Meals'], 'data': [[1.4], [1.3333333333333333], [2.0], [1.25], [1.0]]}
My second question (after updating my post) is:
- which (week of the study) the max week was? Note that the study was performed for 1-4/5 weeks. So the output would look like this, with an extra column called (Week of the study):
Output:
                                Meals      max_week
Id  Week    Week of the study       
1    4            1               1          FALSE
     5            2               1          FALSE
     1            3               2          TRUE
     2            4               1          FALSE
     3            5               1          FALSE
2    2            1               2          FALSE
     3            2               2          FALSE
     4            3               2          TRUE
     5            4               2          TRUE
3    2            1               2          FALSE
     3            2               2          TRUE
     4            3               2          FALSE
     5            4               3          FALSE
     1            5               3          FALSE
Then I want only to save the ID, Week of the study and Snacks only for the true max weeks as follows:
                                 Meals    max_week
ProlificId  Week of the study       
    1              3                2       TRUE
    2              3                2       TRUE
    2              4                2       TRUE
    3              2                2       TRUE
Your help is highly appreciated Shosho
 
    