Fused multiply add and default rounding modes

Question

With GCC 5.3 the following code compield with -O3 -fma

float mul_add(float a, float b, float c) {
  return a*b + c;
}

produces the following assembly

vfmadd132ss     %xmm1, %xmm2, %xmm0
ret

I noticed GCC doing this with -O3 already in GCC 4.8.

Clang 3.7 with -O3 -mfma produces

vmulss  %xmm1, %xmm0, %xmm0
vaddss  %xmm2, %xmm0, %xmm0
retq

but Clang 3.7 with -Ofast -mfma produces the same code as GCC with -O3 fast.

I am surprised that GCC does with -O3 because from this answer it says

The compiler is not allowed to fuse a separated add and multiply unless you allow for a relaxed floating-point model.

This is because an FMA has only one rounding, while an ADD + MUL has two. So the compiler will violate strict IEEE floating-point behaviour by fusing.

However, from this link it says

Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision.

So now I am confused and concerned.

Is GCC justified in using FMA with -O3?
Does fusing violate strict IEEE floating-point behaviour?
If fusing does violate IEEE floating-point beahviour and since GCC returns __STDC_IEC_559__ isn't this a contradiction?

Since FMA can be emulated in software it seems to be there should be two compiler switches for FMA: one to tell the compiler to use FMA in calculations and one to tell the compiler that the hardware has FMA.

Apprently this can be controlled with the option -ffp-contract. With GCC the default is -ffp-contract=fast and with Clang it's not. Other options such as -ffp-contract=on and -ffp-contract=off do no produce the FMA instruction.

For example Clang 3.7 with -O3 -mfma -ffp-contract=fast produces vfmadd132ss.

I checked some permutations of #pragma STDC FP_CONTRACT set to ON and OFF with -ffp-contract set to on, off, and fast. IN all cases I also used -O3 -mfma.

With GCC the answer is simple. #pragma STDC FP_CONTRACT ON or OFF makes no difference. Only -ffp-contract matters.

GCC it uses fma with

-ffp-contract=fast (default).

With Clang it uses fma

with -ffp-contract=fast.
with -ffp-contract=on (default) and #pragma STDC FP_CONTRACT ON (default is OFF).

In other words with Clang you can get fma with #pragma STDC FP_CONTRACT ON (since -ffp-contract=on is the default) or with -ffp-contract=fast. -ffast-math (and hence -Ofast) set -ffp-contract=fast.

I looked into MSVC and ICC.

With MSVC it uses the fma instruction with /O2 /arch:AVX2 /fp:fast. With MSVC /fp:precise is the default.

With ICC it uses fma with -O3 -march=core-avx2 (acctually -O1 is sufficient). This is because by default ICC uses -fp-model fast. But ICC uses fma even with -fp-model precise. To disable fma with ICC use -fp-model strict or -no-fma.

So by default GCC and ICC use fma when fma is enabled (with -mfma for GCC/Clang or -march=core-avx2 with ICC) but Clang and MSVC do not.

I'm pretty sure what gcc is doing is ok. After reading the FLT_EVAL_METHOD doc about contracting FP expressions, I'm surprised `clang` *doesn't* do this. I'm not posting this as an answer, since it's not based on any real standards documentation, just my understanding of how *I* think things should work / should have been designed, given the material in the question. — Peter Cordes, Dec 23 '15 at 13:04
@FUZxxl, do you think the floating point tag would be more appropriate than ieee-754? (if so feel free to change it). I feel like I should be using the floating point tag as well. — Z boson, Dec 23 '15 at 13:06
"Does fusing violate strict IEEE floating-point behavior?" --> IMO, yes. Use `double fma(double x, double y, double z);`instead as that is a function call that in an optimized compiler will call the expected assembly code. This does not violate "IEEE floating-point behaviour". — chux - Reinstate Monica, Dec 23 '15 at 13:23
Does this answer your question? [Difference in gcc -ffp-contract options](https://stackoverflow.com/questions/43352510/difference-in-gcc-ffp-contract-options) — Sang, Apr 19 '20 at 23:14

score 6 · Accepted Answer · answered Jan 15 '16 at 19:03

6

It doesn't violate IEEE-754, because IEEE-754 defers to languages on this point:

A language standard should also define, and require implementations to provide, attributes that allow and disallow value-changing optimizations, separately or collectively, for a block. These optimizations might include, but are not limited to:

...

― Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition.

In standard C, the STDC FP_CONTRACT pragma provides the means to control this value-changing optimization. So GCC is licensed to perform the fusion by default, so long as it allows you to disable the optimization by setting STDC FP_CONTRACT OFF. Not supporting that means not adhering to the C standard.

answered Jan 15 '16 at 19:03

Stephen Canon

103,815
19
183
269

What do you mean by "Not supporting that means not adhering to the C standard"? Incidentally, GCC seems to ignore `STDC FP_CONTRACT`. Instead it only uses `-ffp-contract`. Clang recognizes both. – Z boson Jan 16 '16 at 18:58
I mean that FP_CONTRACT is part of the C standard. To ignore it is to not conform. – Stephen Canon Jan 16 '16 at 19:38
Oh, I did you realized you were referring to GCC not supporting `FP_CONTRACT` (or any compiler which does not support it). Now I understand. – Z boson Jan 16 '16 at 20:05
So [this answer](http://stackoverflow.com/questions/15933100/how-to-use-fused-multiply-add-fma-instructions-with-sse-avx/15933677#15933677) is wrong then "the compiler will violate strict IEEE floating-point behavior by fusing"? That's what through me off. – Z boson Jan 16 '16 at 20:08
The standard defers to languages to set policy for this, so if an implementation doesn't adhere to the language standard, it's definitely at least violating the spirit of IEEE 754. – Stephen Canon Jan 16 '16 at 20:13
If GCC did recognize `FP_CONTRACT` it would be free to have `ON` as default so then the answer would be wrong. And in any case GCC supports `-ffp-contract` which effectively does the same thing. Let me put it a different way. Clang recognizes `FP_CONTRACT ` and defaults to `OFF`. Would Clang violate IEEE if it defaulted to `FP_CONTRACT ON`? – Z boson Jan 16 '16 at 20:21
1

The default can be either ON or OFF. But you need to support the pragma to conform to the standard. – Stephen Canon Jan 16 '16 at 20:24

score 4 · Answer 2 · answered Dec 23 '15 at 13:40

4

When you quoted that fused multiply-add is allowed, you left out the important condition "unless pragma FP_CONTRACT is off". Which is a newish feature in C (I think introduced in C99) and was made absolutely necessary by PowerPC, which all had fused multiply-add from the start - actually, x*y was equivalent to fma (x, y, 0) and x+y was equivalent to fma (1.0, x, y).

FP_CONTRACT is what controls fused multiply/add, not FLT_EVAL_METHOD. Although if FLT_EVAL_METHOD allows higher precision, then contracting is always legal; just pretend that the operations were performed with very high precision and then rounded.

The fma function is useful if you don't want the speed, but the precision. It will calculate the contracted result slowly but correctly even if it isn't available in hardware. And should be inlined if it is available in hardware.

answered Dec 23 '15 at 13:40

gnasher729

51,477
5
75
98

I think this to some degree answers my first question about if GCC is justified in just fma with `-O3`. But it's still not clear if it's IEEE compliant. And since GCC defines `__STDC_IEC_559__` then I can assume it's IEEE compliant but other people claim fma breaks this (which would argue GCC is not justified in doing this when `__STDC_IEC_559__` is defined). So I am still confused. – Z boson Dec 23 '15 at 20:04
@Zboson: I noticed that stuff about the pragma in the doc I linked you, but didn't know how new or widely supported that was. That's why I didn't mention it earlier. – Peter Cordes Dec 23 '15 at 23:08
1

@PeterCordes, that's okay, GCC does not seem to care about that pragma anyway so it's a moot issue. And in anycase it says nothing about it being IEEE compliant. GCC returns `__STDC_IEC_559__` and at the same uses `-ffp-contract=fast` so I still want to know if this is a contradiction. – Z boson Dec 24 '15 at 14:07

Fused multiply add and default rounding modes

2 Answers2

Linked