Consider the following spin_lock() implementation, originally from this answer:
void spin_lock(volatile bool* lock) {
for (;;) {
// inserts an acquire memory barrier and a compiler barrier
if (!__atomic_test_and_set(lock, __ATOMIC_ACQUIRE))
return;
while (*lock) // no barriers; is it OK?
cpu_relax();
}
}
What I already know:
volatileprevents compiler from optimizing out*lockre-read on each iteration of thewhileloop;volatileinserts neither memory nor compiler barriers;- such an implementation actually works in GCC for
x86(e.g. in Linux kernel) and some other architectures; - at least one memory and compiler barrier is required in
spin_lock()implementation for a generic architecture; this example inserts them in__atomic_test_and_set().
Questions:
Is
volatileenough here or are there any architectures or compilers where memory or compiler barrier or atomic operation is required in thewhileloop?1.1 According to
C++standards?1.2 In practice, for known architectures and compilers, specifically for GCC and platforms it supports?
- Is this implementation safe on all architectures supported by GCC and Linux? (It is at least inefficient on some architectures, right?)
- Is the
whileloop safe according toC++11and its memory model?
There are several related questions, but I was unable to construct an explicit and unambiguous answer from them:
Q: Memory barrier in a single thread
In principle: Yes, if program execution moves from one core to the next, it might not see all writes that occurred on the previous core.
Q: memory barrier and cache flush
On pretty much all modern architectures, caches (like the L1 and L2 caches) are ensured coherent by hardware. There is no need to flush any cache to make memory visible to other CPUs.
Q: Do spin locks always require a memory barrier? Is spinning on a memory barrier expensive?
Q: Do you expect that future CPU generations are not cache coherent?