Assume having the following DataFrame:
rng = pd.date_range('1/1/2011', periods=72, freq='H')
np.random.seed(10)
n = 10
df = pd.DataFrame(
    {
        "datetime": np.random.choice(rng,n),
        "cat": np.random.choice(['a','b','b'], n),
        "val": np.random.randint(0,5, size=n)
        }
    )
If I now groupby:
gb = df.groupby(['cat','datetime']).sum()
I get the totals for each cat for each hour:
cat datetime            val
a   2011-01-01 00:00:00 1
    2011-01-01 09:00:00 3
    2011-01-02 16:00:00 1
    2011-01-03 16:00:00 1
b   2011-01-01 08:00:00 4
    2011-01-01 15:00:00 3
    2011-01-01 16:00:00 3
    2011-01-02 04:00:00 4
    2011-01-02 05:00:00 1
    2011-01-02 12:00:00 4
However, I would like to have something like:
cat datetime   val
a   2011-01-01 4
    2011-01-02 1
    2011-01-03 1
b   2011-01-01 10
    2011-01-02 9
I could get the desired result by adding another column called date:
df['date'] = df.datetime.apply(pd.datetime.date)
and then do a similar groupby: df.groupby(['cat','date']).sum(). But I am interested whether there's more pythonic way to do it? In addition, I might want to have a look on the month or year level. So, what would be the right way?
 
     
     
    