Why does GCC 10 mfence no more STD Atomic operations? Is it sufficient enough by the memory Model?

Question

I discovered currently that GCC 10 will no more use the mov and mfence method and instead will use the implied lock by an xchg. Is this sufficient by the memory model to not break any stuff when using multithreading?

As an example I tried on godbolt first with gcc 9.3 and then with gcc 10.2 was the following Code (as optimization I used -O2):

#include <stdint.h>
#include <atomic>

std::atomic_int32_t    idx;

int32_t increment(void) 
{
    return idx = (idx + 1);
}

The results were the following:

GCC 9.3:

increment():
        mov     eax, DWORD PTR idx[rip]
        add     eax, 1
        mov     DWORD PTR idx[rip], eax
        mfence
        ret
idx:
        .zero   4

GCC 10.2:

increment():
        mov     eax, DWORD PTR idx[rip]
        add     eax, 1
        mov     edx, eax
        xchg    edx, DWORD PTR idx[rip]
        ret
idx:
        .zero   4

Could someone enlight me or just point me to the right point in the programming manual.

With best regards

Edit: Ok the part with the memory model is answered by the two mentioned threads.

But the other question was: Why it changed now with gcc 10? The issues mentioned about skylake etc. are also a few days old.

Are you expecting the increment to be atomic? You have combined two atomic operations into a non-atomic operation. — David Schwartz, Apr 09 '21 at 20:26
That's not my point of interest. In the assembly generated the result was first a mfence which locked the load store ordering and now with gcc 10 it is implemented as xchg (which locks implicitly when operating on memory). — Tobias Off, Apr 09 '21 at 20:43
GCC changed to `xchg` because it's faster than `mov`+`mfence` on most CPUs most of the time, or at least not worse. In fact, a dummy `lock or byte [rsp], 0` may be faster than `mfence` when you need a stand-alone barrier like `atomic_thread_fence(mo_seq_cst)` that isn't part of a store. I thought my linked answer on the first duplicate mentioned all that. — Peter Cordes, Apr 09 '21 at 21:03

Why does GCC 10 mfence no more STD Atomic operations? Is it sufficient enough by the memory Model?

0 Answers0