I´m trying to do a linear regression on the results of a dataframe groupby by date and aggregate the results on another dataframe. So far I´m using an intermediate Series:
The dataframe is similar to
marker    date         variable       identifier  value
EA    2007-01-01      0.33            55          123
EA    2007-01-01      0.73            56          1123
EA    2007-01-01      0.51            57          123
EA    2007-02-01      0.13            55          4446
EA    2007-02-01      0.23            57          667
EA    2007-03-01      0.82            55          5675
EA    2007-03-01      0.88            56          1
EB    2007-01-01      0.13            45          123
EB    2007-01-01      0.74            46          33234
EB    2007-01-01      0.56            47          111
EB    2007-02-01      0.93            45          42657
EB    2007-02-01      0.23            47          12321355
EB    2007-03-01      0.82            45          9897
EB    2007-03-01      0.38            46          786
EB    2007-03-01      0.19            47          993845
And the code snippet:
import statsmodels as sm
import pandas as pd
reg_results = pd.Series(name='reg_results')
mean_results = pd.Series(name='mean_results')
for date, group in df.groupby(df.index.date):
    formula = sm.formula.ols('value ~ variable', data=group).fit()
    reg_results.set_value(date.strftime("%Y-%m-%d"), formula.params['Intercept'] + formula.params['variable']*group['variable'])
    mean_results.set_value(date.strftime("%Y-%m-%d"), group.mean()['variable'])
final_df = pd.DataFrame()
final_df = pd.concat([reg_results, mean_results], axis=1)
There are other operations like a second groupby on the group and so on, so I get to create one series per operation that I want to create, and this gets very complicated very fast. Is there a way to do this on one step, or at least without the intermediate series?
 
     
    