My dataset is too big/formula too complicated to run biglm, fastLm, speedlm or lm in one go. Therefor I'm down to splitting up my dataset in smaller pieces and performing an update for every 50.000 rows.
A simplified version of what I'm using. Replacing the iris dataset by my own.
library(speedglm)
chunk1 <- iris[1:10,]
chunk2 <- iris[11:20,]
chunk3 <- iris[21:30,]
lmfit <- speedlm(Sepal.Length ~ Sepal.Width + Species, chunk1)
for (i in list(11,20, 21:30)){
lmfit2 <- updateWithMoreData(lmfit, iris[i,])
}
lmfit2
Splitting up the model gets me the following error:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
- Changing the formula is not an option, as each effect is relevant.
- Making the 'smaller pieces' bigger is not an option, as the dataset will get too big and slow down performance
- I have no clue which columns are erroneous, it may also differ at times which columns will give this error.
What are my options?