i have the following df:
 df <- data.frame(
      Company = rep(LETTERS[1:10], each = 10),
      Year = rep(2010:2019, times = 10),
      Var1 = runif(100, -1, 1),
      Var2 = runif(100, -1, 1),
      Var3 = runif(100, -1, 1),
      Var4 = runif(100, -1, 1),
      Var5 = runif(100, -1, 1)
    )
I am trying to run 10 ols regressions per year and adding all the coefficients, standard errors, t-statistic and p value into one big new dataframe. I came across this link and tried the following: Running several linear regressions from a single dataframe in R
coef_matrix <- df %>% group_by(Year) %>% do(tidy(lm(Var1 ~ Var2 + Var3 + Var4 + Var5, data=df)))
However, this gives me this output:
head(coef_matrix,15)
# A tibble: 15 x 6
# Groups:   Year [3]
    Year term        estimate std.error statistic p.value
   <int> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1  2010 (Intercept)   0.0274    0.0591     0.463  0.644 
 2  2010 Var2         -0.0339    0.113     -0.301  0.764 
 3  2010 Var3         -0.0705    0.112     -0.628  0.532 
 4  2010 Var4          0.0663    0.0973     0.682  0.497 
 5  2010 Var5          0.201     0.0983     2.04   0.0440
 6  2011 (Intercept)   0.0274    0.0591     0.463  0.644 
 7  2011 Var2         -0.0339    0.113     -0.301  0.764 
 8  2011 Var3         -0.0705    0.112     -0.628  0.532 
 9  2011 Var4          0.0663    0.0973     0.682  0.497 
10  2011 Var5          0.201     0.0983     2.04   0.0440
11  2012 (Intercept)   0.0274    0.0591     0.463  0.644 
12  2012 Var2         -0.0339    0.113     -0.301  0.764 
13  2012 Var3         -0.0705    0.112     -0.628  0.532 
14  2012 Var4          0.0663    0.0973     0.682  0.497 
15  2012 Var5          0.201     0.0983     2.04   0.0440
> 
I want to try and run 10 annual regressions for every year and then store all their coefficients, standard errors etcetera into a dataframe. Right now the coefficients are the same as if i would run a normal ols regression for all years together:
summary(lm(Var1 ~ Var2 + Var3 + Var4 + Var5, data=df))
Call:
lm(formula = Var1 ~ Var2 + Var3 + Var4 + Var5, data = df)
Residuals:
     Min       1Q   Median       3Q      Max 
-1.10859 -0.51344 -0.05597  0.49481  1.13573 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.02736    0.05909   0.463    0.644  
Var2        -0.03392    0.11252  -0.301    0.764  
Var3        -0.07045    0.11223  -0.628    0.532  
Var4         0.06633    0.09730   0.682    0.497  
Var5         0.20064    0.09828   2.041    0.044 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5851 on 95 degrees of freedom
Multiple R-squared:  0.05331,   Adjusted R-squared:  0.01345 
F-statistic: 1.338 on 4 and 95 DF,  p-value: 0.2617
How do i change this? I thought the group_by function was supposed to run the annual regression, but it seems like i am making a mistake. So for every year, the coefficients, standard errors etcetera should be different (of course). So for 2010, the output should be this in the newly created df:
summary(lm(Var1 ~ Var2 + Var3 + Var4 + Var5, data=df[df$Year=="2010", ]))
Call:
lm(formula = Var1 ~ Var2 + Var3 + Var4 + Var5, data = df[df$Year == 
    "2010", ])
Residuals:
       1       11       21       31       41       51       61       71       81       91 
-0.04871 -0.08103 -0.79737  0.10159  0.18243  0.41049 -0.05951 -0.12389  0.18528  0.23072 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -0.3293     0.2223  -1.481   0.1986  
Var2          0.8657     0.3569   2.426   0.0597 .
Var3          0.5305     0.4735   1.120   0.3134  
Var4         -0.2685     0.2446  -1.097   0.3225  
Var5         -0.1988     0.3729  -0.533   0.6167  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4389 on 5 degrees of freedom
Multiple R-squared:  0.612, Adjusted R-squared:  0.3017 
F-statistic: 1.972 on 4 and 5 DF,  p-value: 0.2372
And this for 2011, and so on:
Call:
lm(formula = Var1 ~ Var2 + Var3 + Var4 + Var5, data = df[df$Year == 
    "2011", ])
Residuals:
        2        12        22        32        42        52        62        72        82        92 
-0.541020 -0.626232  0.627003  0.326570  0.450827 -0.505888 -0.008373  0.264653  0.190374 -0.177914 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.23500    0.27901   0.842    0.438
Var2        -0.49973    0.59313  -0.843    0.438
Var3         0.21137    0.55453   0.381    0.719
Var4         0.34993    0.40960   0.854    0.432
Var5         0.09965    0.37772   0.264    0.802
Residual standard error: 0.597 on 5 degrees of freedom
Multiple R-squared:  0.2734,    Adjusted R-squared:  -0.3079 
F-statistic: 0.4703 on 4 and 5 DF,  p-value: 0.7577
 
     
    