I've been trying to wrap my head around atomics in C++, and while trying to understand them, I noticed something with a simple example that doesn't make sense to me.
Compiling the sample C++ code...:
#include <atomic>
std::atomic<int> a, b, c;
void variable_release() {
b.store(123, std::memory_order_relaxed);
c.store(2, std::memory_order_relaxed);
a.store(1, std::memory_order_release); // release
b.store(456, std::memory_order_relaxed);
}
void fence_release() {
b.store(123, std::memory_order_relaxed);
c.store(2, std::memory_order_relaxed);
atomic_thread_fence(std::memory_order_release); // release
a.store(1, std::memory_order_relaxed);
b.store(456, std::memory_order_relaxed);
}
... results in essentially identical assembly for both GCC (11.2) and Clang (13.0.0), when compiled with -O3 -march=native:
variable_release():
mov DWORD PTR b[rip], 123
mov DWORD PTR c[rip], 2
mov DWORD PTR a[rip], 1
mov DWORD PTR b[rip], 456
ret
fence_release():
mov DWORD PTR b[rip], 123
mov DWORD PTR c[rip], 2
mov DWORD PTR a[rip], 1
mov DWORD PTR b[rip], 456
ret
c:
.zero 4
b:
.zero 4
a:
.zero 4
I can understand the assembly generation for fence_release, due to the following quote from the cppreference:
an
atomic_thread_fencewithmemory_order_releaseordering prevents all preceding writes from moving past all subsequent stores.
This would seem to imply that because c is written to after b.store(123, ...), the second store to b can't be reordered above c.store(...), and thus there is no way to avoid the first b.store(123, ...) altogether.
However, I don't understand why the first store to b still occurs in variable_release. I would expect that the second store to b can freely move up, above the store to c, and thus the redundant store could be eliminated.
Can the write b.store(123, ...) be elided in either of these functions? Am I misunderstanding something or is this a missing compiler optimization?