Your problem is similar to the one reported here on the randomForest classifier.
Apparently glm checks through the variables in your data and throws an error because X contains only NA values.
You can fix that error by
- either by dropping X completely from your dataset, setting Cancer$X <- NULLbefore handing it toglmand leavingXout in your formula (glm(diagnosis~.-id, data = Cancer, family = binomial));
- or by adding na.action = na.passto theglmcall (which will instruct to ignore the NA-warning, essentially) but still excluding X in the formula itself (glm(diagnosis~.-id-X, data = Cancer, family = binomial, na.action = na.pass))
However, please note that still, you'd have to make sure to provide the diagnosis variable in a form digestible by glm. Meaning: either a numeric vector with values 0 and 1, a logical or a factor-vector
"For binomial and quasibinomial families the response can also be specified as a factor (when the first level denotes failure and all others success)" - from the glm-doc
Just define Cancer$diagnosis <- as.factor(Cancer$diagnosis).
On my end, this still leaves some warnings, but I think those are coming from the data or your feature selection. It clears the blocking errors :)