I'm interested in computing a statistic over a rolling window. The statistic will be computed over multiple columns. Here is a toy example calculating regression coefficients over a rolling window.
def regression_coef(df):
if df.shape[0]==0:
return np.array([np.NaN, np.NaN])
y = df.y.values
X = df.drop('y',axis = 1).values
reg = LinearRegression().fit(X,y).coef_.round(2)
return reg
time = np.arange(5,3605,5)
x = np.random.normal(size = time.size)
z = np.random.normal(size = time.size)
y = 2*x+z + np.random.normal(size = time.size)
df = pd.DataFrame({'x':x, 'z':z, 'y':y}, index = pd.to_datetime(time, unit ='s'))
When I call df.rolling('20 T').apply(regression_coef) I get the following error: AttributeError: 'numpy.ndarray' object has no attribute 'y'. This leads me to believe that df.rolling is computes statistics over the individual columns, rather than finding all observations within the 20 minute window.
How can I achieve what I want? That is to say, how can I compute regression_coef in a rolling window? In particular, I'm interested if this can be solved for use with offsets and with the existing pandas API.