I'm having problem in in finding the right parameters for the information gain, if I don't have any discrete values and thus I first need to discretize these points into intervals.
What I have:
I'm doing image processing, where my features have a possible range 0-255. With some training data I can define some intervals (which only define "is object or is not object"). If goods are the number of intervals for for a matching point and bads is labeled for its environment. I'll calculate it this way with

information gain for this case:

where

Results and idea:
For some reason I end up with a negative IG which is quiet nonsense but I don't see the error. Another idea was instead of counting the object-matching intervals forgood, count the samples in good that fit into any good-interval.
Has anyone an idea?