The dataset I am working with is taken from the Current Population Survey on IPUMS, and it has around 1,716,121 observations of 13 variables. I am trying to run a cross-validation on this data and then graph the resulting AUC.
The model I am using is a logistic regression and my dependent variable is a binary variable (has either a value of 0 or 1). Whenever I run the code, I get the warning:
In bind_rows_(x, .id) : Vectorizing 'labelled' elements may not preserve their attributes.
I am not sure what this means.
I also get the errors:
Error in select(., .id, outcome, pred) : unused arguments (.id, outcome, pred)"
and
Error in summarise_impl(.data, dots) : Evaluation error: object 'outcome' not found.
If someone could help me with this, it would be greatly appreciated!
My code is:
    mod1_formula<-formula("self_employ~
    as.factor(educ_level)+
    as.factor(SEX)+
    as.factor(RACE)+
    as.factor(NCHILD)")
    cps_data %>%
    crossv_kfold(k=2) %>%
    mutate(model = purrr::map(train, ~glm(mod1_formula, data=., 
    family=binomial))) -> trained.models
    trained.models %>%
    unnest( pred = map2( model, test, ~predict( .x, .y, type = 
    "response")) ) -> test.predictions`
    trained.models %>%
    unnest( fitted = map2(model, test, ~augment(.x, newdata = 
    .y)),
    pred = map2( model, test, ~predict( .x, .y, type = 
    "response")) ) -> test.predictions
    test.predictions %>% select(.id, outcome, pred )
    test.predictions %>%
    group_by(.id) %>%
    summarize(auc = roc(outcome, .fitted)$auc) %>%
    select(auc)
    gg <- ggplot(data=test.predictions, aes(x= auc))
    gg <- gg+geom_histogram()
    gg
 
     
    