Sum values in two columns based on the first record of a group in another column

Question

Let's assume I have a pandas dataframe like this:

import pandas as pd
d = {'entrance': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'e', 'e', 'e', 'e'], 
      'hour': [3, 4, 4, 4, 2, 2, 2, 2, 0, 0, 0, 1, 12, 12, 12, 12, 7, 7, 8, 8],
      'in': [3, 2, 0, 4, 3, 2, 1, 0, 2, 1, 5, 0, 1, 1, 2, 3, 10, 10, 1, 5],
      'out': [0, 1, 0, 0, 1, 2, 1, 3, 1, 1, 0, 2, 0, 2, 3, 0, 0, 0, 25, 0]}
df = pd.DataFrame(data=d)
df

    entrance    hour    in  out
    a           3       3   0
    a           4       2   1
    a           4       0   0
    a           4       4   0
    b           2       3   1
    b           2       2   2
    b           2       1   1
    b           2       0   3
    c           0       2   1
    c           0       1   1
    c           0       5   0
    c           1       0   2
    d           12      1   0
    d           12      1   2
    d           12      2   3
    d           12      3   0
    e           7       10  0
    e           7       10  0
    e           8       1   25
    e           8       5   0

I would like to sum the values in the columns in and out grouped by hour. I know I can achieve this with:

df_grouped = df.groupby(['hour']).agg({'in':'sum', 'out': 'sum'}).reset_index()

However, the values in the columns in and out should only be grouped by the first hour record of each entrance group. So for entrance group 'a', the values in and out should be summed for the first record of the corresponding hours, which is in this case hour '3'. The desired output should be:

hour   in    out
3      9     1
2      6     7
0      8     4
12     7     5
7      27    25

What is the best way to do this?

Need `df_grouped = df.groupby(['entrance']).agg({'hour':'first', 'in':'sum', 'out': 'sum'}).reset_index(drop=True)` ? — jezrael, Oct 03 '22 at 12:44

Sum values in two columns based on the first record of a group in another column

0 Answers0