Here's my code:
int f(double x)
{
return isnan(x);
}
If I #include <cmath> I get this assembly:
xorl %eax, %eax
ucomisd %xmm0, %xmm0
setp %al
This is reasonably clever: ucomisd sets the parity flag if the comparison of x with itself is unordered, meaning x is NAN. Then setp copies the parity flag into the result (only a single byte, hence the initial clear of %eax).
But if I #include <math.h> I get this assembly:
jmp __isnan
Now the code is not inline, and the __isnan function is certainly no faster the the ucomisd instruction, so we have incurred a jump for no benefit. I get the same thing if I compile the code as C.
Now if I change the isnan() call to __builtin_isnan(), I get the simple ucomisd instruction instruction regardless of which header I include, and it works in C too. Likewise if I just return x != x.
So my question is, why does the C <math.h> header provide a less efficient implementation of isnan() than the C++ <cmath> header? Are people really expected to use __builtin_isnan(), and if so, why?
I tested GCC 4.7.2 and 4.9.0 on x86-64 with -O2 and -O3 optimization.