Is there any wording in the standard that guarantees that relaxed stores to atomics won't be lifted above the locking of a mutex? If not, is there any wording that explicitly says that it's kosher for the compiler or CPU to do so?
For example, take the following program (which could potentially use acq/rel for foo_has_been_set and avoid the lock, and/or make foo itself atomic.  It's written this way to illustrate this question.)
std::mutex mu;
int foo = 0;  // Guarded by mu
std::atomic<bool> foo_has_been_set{false};
void SetFoo() {
  mu.lock();
  foo = 1;
  foo_has_been_set.store(true, std::memory_order_relaxed);
  mu.unlock();
}
void CheckFoo() {
  if (foo_has_been_set.load(std::memory_order_relaxed)) {
    mu.lock();
    assert(foo == 1);
    mu.unlock();
  }
}
Is it possible for CheckFoo to crash in the above program if another thread is calling SetFoo concurrently, or is there some guarantee that the store to foo_has_been_set can't be lifted above the call to mu.lock by the compiler and CPU?
This is related to an older question, but it's not 100% clear to me that the answer there applies to this. In particular, the counter-example in that question's answer may apply to two concurrent calls to SetFoo, but I'm interested in the case where the compiler knows that there is one call to SetFoo and one call to CheckFoo. Is that guaranteed to be safe?
I'm looking for specific citations in the standard.