In the minimal example below, I am trying to use the values of a character string vars in a regression formula. However, I am only able to pass the string of variable names ("v2+v3+v4") to the formula, not the real meaning of this string (e.g., "v2" is dat$v2).
I know there are better ways to run the regression (e.g., lm(v1 ~ v2 + v3 + v4, data=dat)). My situation is more complex, and I am trying to figure out how to use a character string in a formula. Any thoughts?
Updated below code
# minimal example
# create data frame
v1 <- rnorm(10)
v2 <- sample(c(0,1), 10, replace=TRUE)
v3 <- rnorm(10)
v4 <- rnorm(10)
dat <- cbind(v1, v2, v3, v4)
dat <- as.data.frame(dat)
# create objects of column names
c.2 <- colnames(dat)[2]
c.3 <- colnames(dat)[3]
c.4 <- colnames(dat)[4]
# shortcut to get to the type of object my full code produces
vars <- paste(c.2, c.3, c.4, sep="+")
### TRYING TO SOLVE FROM THIS POINT:
print(vars)
# [1] "v2+v3+v4"
# use vars in regression
regression <- paste0("v1", " ~ ", vars)
m1 <- lm(as.formula(regression), data=dat)
Update:
@Arun was correct about the missing "" on v1 in the first example. This fixed my example, but I was still having problems with my real code. In the code chunk below, I adapted my example to better reflect my actual code. I chose to create a simpler example at first thinking that the problem was the string vars.
Here's an example that does not work :) Uses the same data frame dat created above.
dv <- colnames(dat)[1]
r2 <- colnames(dat)[2]
# the following loop creates objects r3, r4, r5, and r6
# r5 and r6 are interaction terms
for (v in 3:4) {
r <- colnames(dat)[v]
assign(paste("r",v,sep=""),r)
r <- paste(colnames(dat)[2], colnames(dat)[v], sep="*")
assign(paste("r",v+2,sep=""),r)
}
# combine r3, r4, r5, and r6 then collapse and remove trailing +
vars2 <- sapply(3:6, function(i) {
paste0("r", i, "+")
})
vars2 <- paste(vars2, collapse = '')
vars2 <- substr(vars2, 1, nchar(vars2)-1)
# concatenate dv, r2 (as a factor), and vars into `eq`
eq <- paste0(dv, " ~ factor(",r2,") +", vars2)
Here is the issue:
print(eq)
# [1] "v1 ~ factor(v2) +r3+r4+r5+r6"
Unlike regression in the first example, eq does not bring in the column names (e.g., v3). The object names (e.g., r3) are retained. As such, the following lm() command does not work.
m2 <- lm(as.formula(eq), data=dat)