I would like to do multicolumn operations (ie correlate below) as well as operations that use results on previous calculations (ie diff calculation below) without using a for loop and using native pandas functions like groupby and agg. Is this possible?
import pandas as pd
import datetime
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'date': [datetime.datetime(2010,1,1)+datetime.timedelta(days=i*15) 
                            for i in range(0,100)],
                   'invested': np.random.random(100)*1e6,
                   'return': np.random.random(100),
                   'side': np.random.choice([-1, 1], 100)})
df['year'] = df['date'].apply(lambda x: x.year)
# want to get rid of the for loop below
ret_year = []
for year in list(list(df['year'].unique())):
    df_this_year = df[df['year'] == year]
    min_short = df_this_year[df_this_year['side'] == -1]['return'].max()
    min_long = df_this_year[df_this_year['side'] == -1]['return'].min()
    min_diff = min_long - min_short
    avg_inv = df_this_year['invested'].mean()
    corr = np.correlate(df_this_year['invested'], df_this_year['return'])[0]
    ret_year.append({'year': year, 'min_short': min_short, 'min_long': min_long,
                     'min_diff': min_diff, 'avg_inv': avg_inv, 'corr': corr})
print(pd.DataFrame(ret_year))
Result:
         avg_inv          corr  min_diff  min_long  min_short  year
0  590766.254452  8.821215e+06 -0.664752  0.297437   0.962189  2010
1  490224.532564  6.122306e+06 -0.900289  0.019193   0.919483  2011
2  438330.806563  4.768964e+06 -0.929680  0.069167   0.998847  2012
3  373038.880789  4.677380e+06 -0.779678  0.164694   0.944372  2013
4  416817.752705  5.014249e+04  0.000000  0.434417   0.434417  2014
Here are some similar questions but not quite the same:
 
     
    