C++11 atomic: why does this code work?

Question

Let's take this struct:

struct entry {
    atomic<bool> valid;
    atomic_flag writing;
    char payload[128];
}

Two treads A and B concurrently access this struct this way (let e be an instance of entry):

if (e.valid) {
    // do something with e.payload...
} else {
    while (e.writing.test_and_set(std::memory_order_acquire));
    if (!e.valid) {
       // write e.payload one byte at a time
       // (the payload written by A may be different from the payload written by B)
       e.valid = true;
       e.writing.clear(std::memory_order_release);
    }
}

I guess that this code is correct and does not present issues, but I want to understand why it works.

Quoting the C++ standard (29.3.13):

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

Now, bearing this in mind, imagine that both thread A and B enter the else block. Is this interleave possible?

Both A and B enter the else branch, because valid is false
A sets the writing flag
B starts to spin lock on the writing flag
A reads the valid flag (which is false) and enters the if block
A writes the payload
A writes true on the valid flag; obviously, if A reads valid again, it would read true
A clears the writing flag
B sets the writing flag
B reads a stale value of the valid flag (false) and enters the if block
B writes its payload
B writes true on the valid flag
B clears the writing flag

I hope this is not possible but when it comes to actually answer the question "why it is not possible?", I'm not sure of the answer. Here is my idea.

Quoting from the standard again (29.3.12):

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

atomic_flag::test_and_set() is an atomic read-modify-write operation, as stated in 29.7.5.

Since atomic_flag::test_and_set() always reads a "fresh value", and I'm calling it with the std::memory_order_acquire memory ordering, then I cannot read a stale value of the valid flag, because I must see all the side-effects caused by A before the atomic_flag::clear() call (which uses std::memory_order_release).

Am I correct?

Clarification. My whole reasoning (wrong or correct) relies on 29.3.12. For what I understood so far, if we ignore the atomic_flag, reading stale data from valid is possible even if it's atomic. atomic doesn't seem to mean "always immediately visible" to every thread. The maximum guarantee you can ask for is a consistent order in the values you read, but you can still read stale data before getting the fresh one. Fortunately, atomic_flag::test_and_set() and every exchange operation have this crucial feature: they always read fresh data. So, only if you acquire/release on the writing flag (not only on valid), then you get the expected behavior. Do you see my point (correct or not)?

EDIT: my original question included the following few lines that gained too much attention if compared to the core of the question. I leave them for consistency with the answers that have been already given, but please ignore them if you are reading the question right now.

~~Is there any point in valid being an atomic<bool> and~~ ~~not a plain bool? Moreover, if it should be an atomic<bool>,~~ ~~what is its 'minimum' memory ordering constraint that will not present~~ ~~issues?~~

The C++11 memory model thinks in terms of "ordering", not "visibility", because ordering is enough as long as visibility happens eventually. (I think you understand this after you conversation with @Grizzly, but I felt like reiterating it here.) This code is fine as written. — Nemo, Sep 02 '13 at 05:18

score 5 · Accepted Answer · edited May 23 '17 at 12:24

5

Inside the else branch valid should be protected by the acquire/release semantics imposed by the operations on waiting. However this does not obliviate the need to make valid an atomic:

You forgot to include the first line (if (e.valid)) in your analysis. If valid was an bool instead of atomic<bool> this access would be completely unprotected. Therefore you could have the situation where a change of valid becomes visible to other threads before the payload is completely written/visible. This means that a thread B could evaluate e.valid to true and enter the do something with e.payload branch while the payload isn't completely written yet.

Other then that your analysis seems somewhat reasonable but not entirely correct to me. The thing to remember with memory ordering is that acquire and release semantics will pair up. Everything written before a release operation can safely be read after an acquire operation on the same veriable reads the modified value. With that in mind the release semantics on waiting.clear(...) ensure that the write to valid must be visible when the loop on writing.test_and_set(...) exits, since the later reads the change of waiting(the write done inwaiting.clear(...)`) with acquire semantics and doesn't exit before that change is visible.

Regarding §29.3.12: It is relevant to the correctness of your code, but unrelated to the reading a stale valid flag. You can't set the flag before the clear, so acquire-release semantics will ensure correctness there. §29.3.12 protects you from the following scenario:

Both A and B enter the else branch, because valid is false

A sets the writing flag

B sees a stale value for writing and also sets it

Both A and B read the valid flag (which is false), enter the if block and write the payload creating a race condition

Edit: For the minimal Ordering constraints: acquire for the loads and release for the stores should probably do the job, however depending on your target hardware you might as well stay with sequential consistency. For the difference between those semantics look here.

edited May 23 '17 at 12:24

Community

1
1

answered Jun 04 '13 at 18:12

Grizzly

19,595
4
60
78

+1 Ok, I definitely see why valid should be atomic (in fact I wrote it so). I still find pretty interesting that both of you having answered "so far" concentrated on the last few lines. Actually I find pretty interesting that this whole code works thanks of a not-so-famous feature of the atomic_flag (and besides all the exhange operations): they are the only ones forcing to see "fresh" values – gd1 Jun 04 '13 at 18:41
@gd1: What is missing from the answers in your opinion? The only question I can see not in the last few lines of your post is the "Is this interleave possible?" The answer to that is pretty clear when saying that your analysis seems ok. However I do not understand your comment about the "not-so-famous feature of the atomic_flag". The code should work because of the acquire/release semantics, nothing more. – Grizzly Jun 04 '13 at 18:45
My whole reasoning (wrong or correct) relies on 29.3.12. Reading stale data from "valid" is possible even if it's atomic. Atomic doesn't seem to mean "immediately visible". The maximum guarantee you can ask for is a consistent order in what you read, but you can still read stale data for a month before getting the fresh one. :) Fortunately, atomic_flag::test_and_set() and every exchange operation have this special: they always read fresh data [29.3.12]. So, only if you acquire/release on "writing" (not only on "valid"), then you get the expected behavior. Do you see my point [correct or not]? – gd1 Jun 04 '13 at 21:53
@gd1: Ok, maybe I misread your question. What I see as the main part of your reasoning regarding the point mentioned in your question is "I'm calling it with the std::memory_order_acquire memory ordering, then I cannot read a stale value of the valid flag, because I must see all the side-effects caused by A before the atomic_flag::clear() call (which uses std::memory_order_release)". test_and_set` always reading fresh data doesn't matter regarding `valid`, since you `valid` is guaranteed to be visible before/when the change to `waiting` is visible. – Grizzly Jun 05 '13 at 14:18
§29.3.12 is still relevant, since it keeps two Threads from simultaneous taking the "lock". Without that guarantee two threads could simultaneous arrive at `while(e.writing.test_and_set(std::memory_order_acquire))`, both see the flag as clear and both set it (possible if the set operation from the other thread takes some time to become visible), thus both trying to fill payload. – Grizzly Jun 05 '13 at 14:23
I think we are basically telling the same thing in two different ways. I'm trying to figure out if `writing` guarantees also for the freshness of `valid` **in the second check**, besides the freshness of itself. I think so and your answer just confirms this. The reason why I quote §29.3.12 is just because in order to read a fresh `valid` you have to read a fresh `writing` too, in order to synchronize with *the last* `clear()`. §23.3.12 gives you that. What's wrong with this analysis? If you read the *clarification* in my question, I think that it basically matches your edits to the answer. – gd1 Jun 10 '13 at 18:30
@gd1: You loop till you see `writing` as clear. It doesn't matter (for this part of the analysis) whether that value is fresh or not (as mentioned: Without §29.3.12 it could be stale because it has already been set again), if you read `writing` as clear then you also see the write to `valid` (unless you are the first thread to enter that block of course). So the acquire/release semantics are the crucial point for the problem you mentioned, while §29.3.12 solves a different problem. So yes §29.3.12 is crucial to the correctness of your code, it just is important for a different reason – Grizzly Jun 11 '13 at 07:39
OK now I get it. Thanks for this interesting discussion. – gd1 Jun 11 '13 at 13:25

score 2 · Answer 2 · answered Jun 04 '13 at 19:49

Section 29.3.12 has nothing to do with why this code is correct or incorrect. The section you want (in the draft version of the standard available online) is Section 1.10: "Multi-threaded executions and data races." Section 1.10 defines a happens-before relation on atomic operations, and on non-atomic operations with respect to atomic operations.

Section 1.10 says that if there are two non-atomic operations where you can not determine the happens-before relationship then you have a data-race. It further declares (Paragraph 21) that any program with a data-race has undefined behavior.

If e.valid is not atomic then you have a data race between the first line of code and the line e.valid=true. So all of your reasoning about the behavior in the else clause is incorrect (the program has no defined behavior so there is nothing to reason about.)

On the other hand if all of your accesses to e.valid were protected by atomic operations on e.writing (like if the else clause was your whole program) then your reasoning would be correct. Event 9 in your list could not happen. But the reason is not Section 29.3.12, it is again Section 1.10, which says that your non-atomic operations will appear to be sequentially consistent if there are no dataraces.

The pattern you are using is called double checked locking‌. Before C++11 it was impossible to implement double checked locking portably. In C++11 you can make double checked locking work correctly and portably. The way you do it is by declaring valid to be atomic.

I wish I didn't write the last few lines of my original question. :) However this answer is *exacly* what I was looking for. I read section 1.10, in particular 1.10.21. It actually says that "programs that use mutexes and memory_order_cst to prevent all data races behave as if the operations ... were simply interleaved" which equals to "your non-atomic operations will appear to be sequentially consistent if there are no dataraces" but total consistent ordering doesn't mean immediate visibility, which seems to be the topic of 29.3.12. — gd1, Jun 04 '13 at 21:32
I mean, the standard says that you see a total, consistent order. But 1) only if you use memory_order_seq_cst and 2) you can still see stale data, because even you get that order, you can get the fresh data also one month later, whatever "reasonable amount of time" means. It seems to me that Section 29.3.12, particularly the words "shall always read the last value" is what saves one's ass because if you read the last value of atomic_flag AND you choose std::memory_order_acquire, THAN you get all the stuff that has been written before the corresponding release on the same atomic_flag. — gd1, Jun 04 '13 at 21:37
29.3.12 and 29.3.13 only describes the behavior of a _single_ atomic variable. They ensure that any individual atomic variable is _coherent_. Section 1.10 is what describes the _consistency model_ (the relationship between reads and write visibility to _different_ variables). Section 1.10 says that if Thread B sees an atomic release by Thread A then it also sees all of Thread A's writes (atomic or not) done before the atomic release. — Wandering Logic, Jun 04 '13 at 23:08

score 1 · Answer 3 · edited Jun 04 '13 at 20:22

1

If valid is not atomic then the initial read of e.valid on the first line conflicts with the assignment to e.valid.

There is no guarantee both threads have already done that read before one of them gets the spinlock, i.e steps 1 and 6 are not ordered.

edited Jun 04 '13 at 20:22

taocp

23,276
10
49
62

answered Jun 04 '13 at 18:19

Jonathan Wakely

166,810
27
341
521

Ok, that's about the last few lines of the question. What about the rest? Is it true that sec. 29.3.12 make the whole game work? :) – gd1 Jun 04 '13 at 18:33
The rest is irrelevant, if valid is not atomic or has undefined behaviour. The reason the operations on e.valid work is because it's atomic, not because there are read-modify-wrire ops on e.writing – Jonathan Wakely Jun 04 '13 at 20:37
@Jonatan Wakely: I agree with you on the fact that valid *must* be atomic, but I don't see why point 9 of my list is impossible without the read-modify-write ops in e.writing, which have the 'special' feature stated in 29.3.12. Can you elaborate further? – gd1 Jun 04 '13 at 21:20
1

I believe thread B cannot read a stale value because in thread A 6 _happens before_ 7, and 7 _synchronizes with_ 8 (due to a store-release and load-acquire pair), and in thread B 8 _happens before_ 9, so 6 _inter-thread happens before_ 9, and the value must be `true`. This is not a reliant on the RMW property. – Jonathan Wakely Jun 04 '13 at 23:50
Indeed. What I was trying to tell you is that 1) atomic_flag::test_and_set (like any exchange operation) do always read 'the freshest' data (sec. 29.3.12); 2) so, if used with load/acquire, 'writing' actually guarantees that B reads the 'last' value of 'valid'. This may seem obvious to you but I think that it is not 100% clear at a first glance. Many people (including me) may think that 'valid' is actually protecting the payload from being re-written by B. Instead 'writing' is ensuring that 'valid' is read up-to-date by B. This is tricky someway. – gd1 Jun 05 '13 at 00:44

score 1 · Answer 4 · answered Sep 02 '13 at 05:00

The store to e.valid needs to a release and the load in the condition needs to be an acquire. Otherwise, the compiler/processor are free to order setting e.valid above writing the payload. There is an opensource tool, CDSChecker, for verifying code like this against the C/C++11 memory model.

C++11 atomic: why does this code work?

4 Answers4