Context
I'm working on a Python-3.11 project and I'm having a difficult time understanding how float type works.
More specifically, I'm working on distances between data points, and I also have thresholds for these distances.
But let's explain in order.
I have a distance called threshold which is a numpy.float32. This distance is a distance between two arbitrary data points. I'll use this threshold as a threshold for other distances. But before using it, I floor it to the 10th decimal number:
display(threshold)
threshold_floored = math.floor(threshold * 10000000000)/10000000000
display(threshold_floored)
>>> output:
    0.16666667
    0.1666666716
I now use a clustering algorithm that creates clusters based on distance and uses threshold_floored as threshold. Points in cluster A have distance smaller than or equal to threshold_floored to points in cluster B. If for some reason the distance between a point in cluster A and a point in cluster B is bigger than or equal to threshold_floored, I print a sentence to notify me of this error.
Running my code I sometimes see the printed sentence, but when I check I get this:
display(threshold_floored)
display(distance_pointsAB)
>>> output:
    0.1666666716
    0.16666667
The distance is less than threshold_floored (but equal to threshold), but then why do I get the notification?
BTW the notification code is this:
if distance_pointsAB > threshold_floored:
    print("Notification")
Problem
However I noticed the following things:
distance_pointsAB_floored = math.floor(distance_pointsAB * 10000000000)/10000000000
display(threshold)
display(threshold_floored)
display(distance_pointsAB)
display(distance_pointsAB_floored)
print("{0:.60f}".format(threshold))
print("{0:.60f}".format(threshold_floored))
print("{0:.60f}".format(distance_pointsAB))
print("{0:.60f}".format(distance_pointsAB_floored))
>>> output:
    0.16666667
    0.1666666716
    0.16666667
    0.1666666716
    0.166666671633720397949218750000000000000000000000000000000000 <---- threshold
    0.166666671600000010355913104831415694206953048706054687500000 <---- threshold_floored
    0.166666671633720397949218750000000000000000000000000000000000 <---- distance_pointsAB
    0.166666671600000010355913104831415694206953048706054687500000 <---- distance_pointsAB_floored
The notification now makes sense, because extending the decimals, distance_pointsAB is indeed bigger than  threshold_floored.
However why does math.floor doesn't round threshold or distance_pointsAB to 0.166666671600000000000000000000000000000000000000000000000000?
And also, since my clustering algorithm should separate points in cluster A and cluster B if their distance is less than my threshold, and I used threshold_floored as criteria, why do I get that points in A and in B have distance bigger than the threshold? It seems that my clustering algorithm used threshold instead of threshold_floored. Am I right?
Is there a way to work properly with floats?
EDIT
I found the problem. The problem was that my threshold was a numpy.float32, and then I floored it converting it into a float. But then my clustering algorithm converted the threshold_floored again to numpy.float32, while the distance_pointsAB resulted in a float. The solution is a matter of setting properly value types.
Thank everybody for your advice!
 
    
two commentscomment may not be what you care about. Distances are best modeled, I think, as real numbers, and floating-point provides a decent approximation of real numbers, albeit with finite precision. But the finite precision is a certain number of *bits* on base 2, not digits in base 10. When you print finite-precision base-2 fractions out in decimal, they look weird. – Steve Summit Jan 17 '23 at 17:55