I came up with following issue when I try to extract the predicted probabilities using support vector machine (SVM). Usually the probability cutoff for a classification algorithm is 0.5. But I need to analysis how the accuracy changes with the probability cutoff for SVM machine learning algorithm.
I used caret package in R with Leave one out cross validation(LOOCV)
First I fitted regular svm model without extracting the class probabilities. So it will only store the predicted class labels.
data source : https://www.kaggle.com/uciml/pima-indians-diabetes-database
require(caret)
set.seed(123)
diabetes <- read.csv("C:/Users/Downloads/228_482_bundle_archive/diabetes.csv")
fitControl1 <- trainControl( method = "LOOCV",savePredictions = T,search = "random")
diabetes$Outcome=factor(diabetes$Outcome)
modelFitlassocvintm1 <- train((Outcome) ~ Pregnancies+BloodPressure+Glucose +
                                BMI+DiabetesPedigreeFunction +Age
                              , data=diabetes, 
                              method = "svmRadialSigma", 
                              trControl = fitControl1,
                              preProcess = c("center", "scale"),
                              tuneGrid=expand.grid(
                                .sigma=0.004930389,
                                .C=9.63979626))
To extract the predicted probabilities, I need to specify classProbs = T inside the trainControl.
set.seed(123)
fitControl2 <- trainControl( method = "LOOCV",savePredictions = T,classProbs = T)
diabetes$Outcome=factor(diabetes$Outcome)
modelFitlassocvintm2 <- train(make.names(Outcome) ~ Pregnancies+BloodPressure+Glucose +
                                BMI+DiabetesPedigreeFunction +Age
                              , data=diabetes, 
                              method = "svmRadialSigma", 
                              trControl = fitControl2,
                              preProcess = c("center", "scale"),
                              tuneGrid=expand.grid(
                                .sigma=0.004930389,
                                .C=9.63979626))
The only difference in modelFitlassocvintm1 and  modelFitlassocvintm2 is the inclusion of classProbs = T inside the trainControl.
If I compare the predicted classes of modelFitlassocvintm1 and modelFitlassocvintm2 , it should be same under 0.5 probability cutoff.
But it is not the case.
table(modelFitlassocvintm2$pred$X1 >0.5,modelFitlassocvintm1$pred$pred)
       
          0   1
  FALSE 560   0
  TRUE    8 200
Then when I further investigate this 8 values which are different, I got following results.
subs1=cbind(modelFitlassocvintm2$pred$X1,modelFitlassocvintm2$pred$pred,modelFitlassocvintm1$pred$pred)
subset(subs1,subs1[,2]!=subs1[,3])
          [,1] [,2] [,3]
[1,] 0.5078631    2    1
[2,] 0.5056252    2    1
[3,] 0.5113336    2    1
[4,] 0.5048708    2    1
[5,] 0.5033003    2    1
[6,] 0.5014327    2    1
[7,] 0.5111975    2    1
[8,] 0.5136453    2    1
It seems that, when the predicted probability is close to 0.5 , there is a discrepancy in the predicted class in modelFitlassocvintm1 and modelFitlassocvintm2. And I saw a similar discrepancy for svm using a different data set also.
What may be the reason for this? Cant we trust the predicted probabilities from svm ? Usually , svm classifies a subject as -1 or 1 , depending on side it lies with respect to the hyperplane. So it not a good thing to rely on the predicted probabilities for svm?
 
    