I apologize if this question was poorly worded, but after hours of searching the web I feel confident in saying this question has not been answered previously. I will do my best to describe in detail exactly what this problem entails.
Data-set summary:
The data being used is financial data (Open, High, Low, Close) that was retrieved from python code and stored within individual CSV documents. Using lapply, the documents were then read and stored. To keep things simple, all I am focusing on currently is daily percentage change, or (Close/shift(Close))-1. For purposes of this problem, I have removed all NAs as well as non-complete tickers from the data.
I have a data frame (converted from list) of 98 columns (the tickers), spanning 1000 rows (the days). The values within the data frame/ matrix are the daily percentage changes for each ticker, on each day.
Objective:
I want to know how to apply the lm() formula over each column through dynamically referencing the column name, using ALL other columns (~ .).
Sample data set:
aapl_pct_chg <- c(.02, .03, .01, -.05, -.01)
tmus_pct_chg <- c(-.01, -.02, .05, .01, -.03)
akam_pct_chg <- c(.1, -.2, .3, -.03, -.07)
intc_pct_chg <- c(.01, .03, .02, .01, .1)
de_pct_chg <- c(-.01, -.05, .05, .1, -.03)
df <- as.data.frame(cbind(aapl_pct_chg, tmus_pct_chg, akam_pct_chg, intc_pct_chg, de_pct_chg))
names(df) <- c("AAPL", "TMUS", "AKAM", "INTC", "DE")
It is simple enough to do the following:
lm_aapl <- lm(AAPL ~ ., data=df)
But I have been unable to find a way to DYNAMICALLY reference the column name without running into errors. What I mean by this is that, ideally, I could run one formula that will capture the lm() model on each column, using every other column.
There are some answered questions that have HELPED (and I apologize, I am unorganized and have tried this in 500 different ways), but none that have solved it. The closest I have come is a formula that does what I want, but it will include AAPL's values when predicting AAPL -- which leads to a good model but not what I want.