OpenCV: Good Training Output but Cascade Classifier is Poor

Question

Very new to OpenCV, and trying my hand at training a haar classifier that can detect images of dogs from side-on. I have used this tutorial as a guide. The author suggests that a relatively effective classifier can be trained using a surprisingly small number of sample images. As per his directions, I collected 40 positive and 600 negative, then used the script provided to generate many more samples in the form of .vec files. Training took about a week an a half through 20 stages with the following parameters:

<?xml version="1.0"?>
<opencv_storage>
<params>
  <stageType>BOOST</stageType>
  <featureType>HAAR</featureType>
  <height>64</height>
  <width>80</width>
  <stageParams>
    <boostType>GAB</boostType>
    <minHitRate>9.9900001287460327e-01</minHitRate>
    <maxFalseAlarm>5.0000000000000000e-01</maxFalseAlarm>
    <weightTrimRate>9.4999999999999996e-01</weightTrimRate>
    <maxDepth>1</maxDepth>
    <maxWeakCount>100</maxWeakCount></stageParams>
  <featureParams>
    <maxCatCount>0</maxCatCount>
    <featSize>1</featSize>
    <mode>ALL</mode></featureParams></params>
</opencv_storage>

During the last stage, the Neg Count Acceptance Ratio was down to 0.000579 - which I took to mean that 0.0579% of negative samples were being wrongly classified as positive, i.e. having dogs in them when they didn't. In other words, 99.942% of samples were being correctly identified. These seemed like pretty good numbers to me, however when I plugged the classifier .xml file into a face-detection program the results were awful.

This is a picture of the classifier being used to analyse a completely black image (camera of the device sat flat against a bench-top to prevent any light from getting in):

(Picture a black screen with several green rectangle borders randomly positioned, some overlapping. Sadly it seems I don't have the necessary reputation to post the real thing...)

My best guess at fixing the classifier is that I need to retrain with a much larger pool of negative and positive samples.

What I really want to know is this: why are the Acceptance Ratio and the real-world performance of the classifier so different? Have I misunderstood the meaning of the Acceptance Ratio? If my understanding of the Ratio is correct, what kind of number should I expect will give me an effective classifier?

Any help would be greatly appreciated.

score 1 · Answer 1 · answered Jul 25 '15 at 20:00

When the test acceptance Ratio is much worse than train acceptance ratio, there are two possibilities:

The training samples (positive and negative patches) are much different than test samples. In this case you should increase the number of samples for a better generalization ability of trained classifier.
The learned classifier is overfited: In this case the achieved acceptance ratio in learning stage is very small (order of 1e-6). Usually when the number of positive and negative samples are small(compared to number of stages), this problem arises.Therefore, You can avoid overfiting by reducing number of stages or increasing number of leaning samples (both positive and negative).

You can check both possibilities. I recommend you to test other feature extraction methods like HOG and also LBP. To this end you only need to changed featureType to HOG or LBP.

The number of positive and negative samples depends on the diversity of samples. It means that If you have an object with wide changes in its appearance (in test images) you need to increase number of positive samples (>500) to cover all possible appearances (the negative samples are the same).

Do not forget to change input parameters for testing of images (min-neighbor,scale,minSize and maxSize).

Thanks Ali, that's really helpful. Just to be crystal clear, if you expect overfitting to be indicated by an Acceptance Ratio in the order of 1e-6, then it is more likely that my classifier (with a Ratio of 0.000579) simply needs more samples to learn from, placing it in the first of your two categories. Would that be right? — rustyDog, Jul 26 '15 at 22:37
The only problem I have with placing my problem in this category is that my classifier still doesn't work even when I test it using the positive images it was trained with (i.e. zero variation between training and testing samples). In this way, the details of my problem don't seem to fit neatly into either of the two options. I will experiment with LBP and HOG as you have suggested. Thanks again for your response. — rustyDog, Jul 26 '15 at 22:37

OpenCV: Good Training Output but Cascade Classifier is Poor

1 Answers1