I have wanted to see contrasts inside a specified model:
is_service ~ action_count * document_entropy
The full dataset is loaded in the code.
Overall the data are these:
> str(dat)
'data.frame':   6432 obs. of  3 variables:
 $ action_count    : num  0.0759 0.1505 0.1435 0.1535 0.2067 ...
 $ document_entropy: num  -0.667 -0.667 -0.667 -0.667 -0.667 ...
 $ is_service      : int  0 0 0 0 0 0 0 0 0 0 ...
The target column has this binomial distribution:
> table(dat$is_service)
   0    1 
6291  141 
Input columns are z-normalized and distributed as follows:
It is interesting to see that when I fit this model (1st part of the code) the procedure ends without a warnings.
However, when I run contrasts with the stats::anova (2nd part of code) it does return warnings.
Question: Why is that happening, and which level is more alarming: single model or the anova analysis of it?
list.of.packages <- c('RCurl')
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
library(RCurl)
x <- getURL("https://rawgit.com/alexmosc/FX_Big_Experiment/master/service_train_saved.csv")
dat <- read.csv(text = x)
dat$X <- NULL
str(dat)
# first part
summary(
     glm(formula = is_service ~ action_count * document_entropy
         , family = binomial(link = 'logit'),
         data = dat
     )
)
# second part
anova(
     glm(formula = is_service ~ 1
         , family = binomial(link = 'logit')
         , data = dat
     )
     , glm(formula = is_service ~ action_count
           , family = binomial(link = 'logit')
           , data = dat
     )
     , glm(formula = is_service ~ action_count + document_entropy
           , family = binomial(link = 'logit')
           , data = dat
     )
     , glm(formula = is_service ~ action_count + document_entropy + action_count:document_entropy
           , family = binomial(link = 'logit')
           , data = dat
     )
     , test = "Chisq"
)


