A lock-free stack can be implemented as a singly linked list. This seems simple until we have to think about what to do with nodes after they've been popped. One strategy is to simply move them to a per-stack LIFO freelist (from which nodes can be reused by subsequent push operations) until eventually all threads are done with the stack, at which point a single thread destroys all nodes in the stack and all nodes in the freelist. Boost.Lockfree uses this strategy. So does Chris Wellons's C11 implementation. I will refer to the latter, because it's easier to read and the details are essentially the same, since C11 atomics are very similar to C++11 atomics.
In Wellons's implementation, which can be found on GitHub here, all lstack_node objects are non-atomic. In particular, this means that all accesses to the next member of an lstack_node object are non-atomic. What I am unable to understand is: why is it that such accesses never race with each other?
The next member is read at lstack.c:30. It is written at lstack.c:39. If these two lines can execute concurrently on the same lstack_node object, then the program contains a race. Is this possible? It seems possible to me:
- Thread 1 calls
lstack_pop, which callspop. It atomically loads the head node's value into the local variableorig. Now,orig.nodeis a pointer to the node that was at the top of the stack just now. (Note that up until this point, only local variables have been modified, so it is impossible for anything that thread 1 has done so far to make a CAS fail in any other thread.) Meanwhile... - Thread 2 calls
lstack_pop.popsucceeds and returnsnode, a pointer to the node that has just been excised from the stack; this is the same node thatorig.nodepoints to in thread 1. It then begins to callpushin order to addnodeto the freelist. The freelist head node is atomically loaded, andnode->nextis set to point to the first node in the freelist. - Oops. This races with the read to
orig.node->nextin thread 1.
Could Wellons's implementation simply be wrong? I doubt it. If his implementation is wrong, then so is the Boost one, because the only way to fix (what appears to me to be) the race condition is to make next atomic. But I don't think the Boost implementation could be wrong in such a basic way without it having been noticed and fixed by now. So I must have made a mistake in my reasoning.