Predict using felm output with standard errors

Question

Is there way to get predict behavior with standard errors from lfe::felm if the fixed effects are swept out using the projection method in felm? This question is very similar to the question here, but none of the answers to that question can be used to estimate standard errors or confidence/prediction intervals. I know that there's currently no predict.felm, but I am wondering if there are workarounds similar to those linked above that might also work for estimating the prediction interval

library(DAAG)
library(lfe)

model1 <- lm(data = cps1, re74 ~ age + nodeg + marr)
predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T, interval="prediction")$fit
# Result:        fit      lwr      upr
# 1 18436.18 2339.335 34533.03

model2 <- felm(data = cps1, re74 ~ age | nodeg + marr)
predict(model2, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T, interval="prediction")$fit
# Does not work

The goal is to estimate a prediction interval for yhat, for which I think I'd need to compute the full variance-covariance matrix (including the fixed effects). I haven't been able to figure out how to do this, and I'm wondering if it's even computationally feasible.

pbaylis · Accepted Answer · 2018-02-15T18:55:31.197

After conversations with several people, I don't believe it is possible to obtain an estimate the distribution of yhat=Xb (where X includes both the covariates and the fixed effects) directly from felm, which is what this question boils down to. It is possible bootstrap them, however. The following code does so in parallel. There is scope for performance improvements, but this gives the general idea.

Note: here I do not compute full prediction interval, just the SEs on Xb, but obtaining the prediction interval is straightforward - just add the root of sigma^2 to the SE.

library(DAAG)
library(lfe)
library(parallel)

model1 <- lm(data = cps1, re74 ~ age + nodeg + marr)
yhat_lm <- predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T)

set.seed(42)
boot_yhat <- function(b) {
  print(b)
  n <- nrow(cps1)
  boot <- cps1[sample(1:n, n, replace=T),]

  lm.model <- lm(data=demeanlist(boot[, c("re74", "age")], list(factor(boot$nodeg), factor(boot$marr))), 
                 formula = re74 ~ age)
  fe <- getfe(felm(data = boot, re74 ~ age | nodeg + marr))

  bootResult <- predict(lm.model, newdata = data.frame(age = 40)) + 
    fe$effect[fe$fe == "nodeg" & fe$idx==0] + 
    fe$effect[fe$fe == "marr" & fe$idx==1]
  return(bootResult)
}

B = 1000
yhats_boot <- mclapply(1:B, boot_yhat)

plot(density(rnorm(10000, mean=yhat_lm$fit, sd=yhat_lm$se.fit)))
lines(density(yhats), col="red")

jay.sf · Answer 2 · 2018-02-12T11:04:31.753

From your first model predict(.) yields this:

#        fit      lwr      upr
# 1 18436.18 2339.335 34533.03

Following 李哲源 we can achieve these results manually, too.

beta.hat.1 <- coef(model1)  # save coefficients
# model matrix: age=40, nodeg = 0, marr=1:
X.1 <- cbind(1, matrix(c(40, 0, 1), ncol=3))  
pred.1 <- as.numeric(X.1 %*% beta.hat.1) # prediction
V.1 <- vcov(model1)  # save var-cov matrix
se2.1 <- unname(rowSums((X.1 %*% V.1) * X.1))  # prediction var
alpha.1 <- qt((1-0.95)/2, df = model1$df.residual)  # 5 % level
pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1)  # 95%-CI
# [1] 18258.18 18614.18
sigma2.1 <- sum(model1$residuals ^ 2) / model1$df.residual  # sigma.sq
PI.1 <- pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1 + sigma2.1) # prediction interval
matrix(c(pred.1, PI.1), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr")))
#        fit      lwr      upr
# 1 18436.18 2339.335 34533.03

Now, your linked example applied to multiple FE, we get this results:

lm.model <- lm(data=demeanlist(cps1[, c(8, 2)], 
                               list(as.factor(cps1$nodeg), 
                                    as.factor(cps1$marr))), re74 ~ age)
fe <- getfe(model2)
predict(lm.model, newdata = data.frame(age = 40)) + fe$effect[fe$idx=="1"]
# [1] 15091.75 10115.21

The first value is with and the second without added FE (try fe$effect[fe$idx=="1"]).

Now we're following the manual approach above.

beta.hat <- coef(model2)  # coefficient
x <- 40  # age = 40
pred <- as.numeric(x %*% beta.hat)  # prediction
V <- model2$vcv  # var/cov
se2 <- unname(rowSums((x %*% V) * x))  # prediction var
alpha <- qt((1-0.95)/2, df = model2$df.residual)  # 5% level
pred + c(alpha, -alpha) * sqrt(se2)  # CI
# [1]  9599.733 10630.697
sigma2 <- sum(model2$residuals ^ 2) / model2$df.residual  # sigma^2
PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2)  # PI
matrix(c(pred, PI), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr")))  # output
#        fit       lwr      upr
# 1 10115.21 -5988.898 26219.33

As we see, the fit is the same as the linked example approach, but now with prediction interval. (Disclaimer: The logic of the approach should be straightforward, the values of the PI should still be evaluated, e.g. in Stata with reghdfe.)

Edit: In case you want to achieve exactly the same output from felm() which predict.lm() yields with the linear model1, you simply need to "include" again the fixed effects in your model (see model3 below). Just follow the same approach then. For more convenience you easily could wrap it into a function.

library(DAAG)
library(lfe)

model3 <- felm(data = cps1, re74 ~ age + nodeg + marr)

pv <- c(40, 0, 1)  # prediction x-values

predict0.felm <- function(mod, pv.=pv) {
  beta.hat <- coef(mod)  # coefficient
  x <- cbind(1, matrix(pv., ncol=3))  # prediction vector
  pred <- as.numeric(x %*% beta.hat)  # prediction
  V <- mod[['vcv'] ] # var/cov
  se2 <- unname(rowSums((x %*% V) * x))  # prediction var
  alpha <- qt((1-0.95)/2, df = mod[['df.residual']])  # 5% level
  CI <- structure(pred + c(alpha, -alpha) * sqrt(se2), 
                  names=c("CI lwr", "CI upr"))  # CI
  sigma2 <- sum(mod[['residuals']] ^ 2) / mod[['df.residual']] # sigma^2
  PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2)  # PI
  mx <- matrix(c(pred, PI), nrow = 1, 
               dimnames = list(1, c("PI fit", "PI lwr", "PI upr")))  # output
  list(CI, mx)
}

predict0.felm(model3)[[2]]
#     PI fit   PI lwr   PI upr
# 1 18436.18 2339.335 34533.03

By this with felm() you can achieve the same prediction interval as with predict.lm().

Thanks for this answer, this helped clarify my thinking and also helped me find some errors in my MWE. But I don't think this answers the question I posed. I would like to find a technique to obtain the same prediction interval from felm as obtained using `lm` and `predict.lm`. The answer you've given only uses the var/cov matrix for the non-fixed effect components in `felm`. I think the fundamental question here is whether it's possible to estimate the full var/cov matrix for all covariates, including fixed effects, when using `felm`. — pbaylis, Feb 12 '18 at 04:48
Unfortunately, the issue with including the fixed effects in the VCV is that I'm running `felm` instead of `lm` because the fixed effects in my actual problem have too many factors to estimate in memory - this working example is on a small dataset, but I am imagining a dataset with millions or billions of observations and hundreds of thousands of fixed effects. So an assumption of the problem is that the fixed effects have to be swept away by `felm`. — pbaylis, Feb 12 '18 at 16:36
Perhaps you find the `lfe::fevcov()` useful then, which throws the FE vcv matrix. — jay.sf, Feb 12 '18 at 16:49
I looked at that function, but it only includes the vcov of the FE, while I need the full variance-covariance matrix (including the covariances between the covariate and the FE) to get the correct prediction interval. — pbaylis, Feb 12 '18 at 18:43

Predict using felm output with standard errors

2 Answers2

Linked