I am attempting to build a model to predict whether a product will get sold on an ecommerce website with 1 or 0 being the output.
My data is a handful of categorical variables, one with a large amount of levels, a couple binary, and one continuous (the price), with an output variable of 1 or 0, whether or not the product listing got sold.
This is my code:
inTrainingset<-createDataPartition(C$Sale, p=.75, list=FALSE)
CTrain<-C[inTrainingset,]
CTest<-C[-inTrainingset,]
gbmfit<-gbm(Sale~., data=C,distribution="bernoulli",n.trees=5,interaction.depth=7,shrinkage=      .01,)
plot(gbmfit)
gbmTune<-train(Sale~.,data=CTrain, method="gbm")
ctrl<-trainControl(method="repeatedcv",repeats=5)
gbmTune<-train(Sale~.,data=CTrain, 
           method="gbm", 
           verbose=FALSE, 
           trControl=ctrl)
ctrl<-trainControl(method="repeatedcv", repeats=5, classProbs=TRUE, summaryFunction =    twoClassSummary)
gbmTune<-trainControl(Sale~., data=CTrain, 
                  method="gbm", 
                  metric="ROC", 
                  verbose=FALSE , 
                  trControl=ctrl)
  grid<-expand.grid(.interaction.depth=seq(1,7, by=2), .n.trees=seq(100,300, by=50),  .shrinkage=c(.01,.1))
  gbmTune<-train(Sale~., data=CTrain, 
           method="gbm", 
           metric="ROC", 
           tunegrid= grid, 
           verebose=FALSE,
           trControl=ctrl)
  set.seed(1)
  gbmTune <- train(Sale~., data = CTrain,
               method = "gbm",
               metric = "ROC",
               tuneGrid = grid,
               verbose = FALSE,
               trControl = ctrl)
I am running into two issues. The first is when I attempt add the summaryFunction=twoClasssummary, and then tune I get this:
Error in trainControl(Sale ~ ., data = CTrain, method = "gbm", metric = "ROC",  : 
  unused arguments (data = CTrain, metric = "ROC", trControl = ctrl)
The second problem if I decide bypass the summaryFunction, is when I try and run the model I get this error:
Error in evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels,  : 
  train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
In addition: Warning message:
In train.default(x, y, weights = w, ...) :
  cannnot compute class probabilities for regression
I tried changing the output variable from a numeric value of 1 or 0, to just a text value, in excel, but that didn't make a difference.
Any help would be greatly appreciated on how to fix the fact that it's interpreting this model as a regression, or the first error message I am encountering.
Best,
Will will@nubimetrics.com
 
     
     
    