Edited to show code I already have...
My job is to predict how much money a movie will make over its first 15 weeks on cable platforms. I do this by using a regression at each week during the first 14 weeks. But I need to automate the steps of calculating each regression:
- Subset total data set by week (14 week's total). So 14 distinct data frames. - df.names = paste("data",1:14,sep="") for(i in 1:14){ d.frame = subset(myData,Week==i) assign(df.names[i],d.frame) }
- Subset each week's data frames into training and test sets. - set.seed(101) train_idx = sample(1:nrow(data1),round(0.7 * nrow(data1)),replace=FALSE) data1_train = data1[train_idx,] data1_test = data1[-train_idx,]
- Run a linear regression on the training set for each week. - Week1_Regress = lm( x ~ coef1 + coef 2 + ... + coefi, data = data1_train)
- Extract the coefficients for each regression into a CSV file. - write.csv(Week1_Regress$coef,"Selected Folder")
- Calculate the RMSE using the test set and extract that into a CSV. - test = predict(Week1_Regress, data1_test) rmse = function(test,obs) { sqrt(sum((obs - test)^2) / length(test)) }
I can do each step individually, but I am looking for a loop or lapply solution so that I don't have to type out 14 versions of the 5 steps.
