I'm working on a runtime library that uses user-level context switching (using Boost::Context), and am having trouble using thread_level variables. Consider the following (reduced) code:
thread_local int* volatile tli;
int main()
{
    tli = new int(1);   // part 1, done by thread 1
    UserLevelContextSwitch();
    int li = *tli;      // part 2, done by thread 2
    cout << li;
}
Since there are two accesses to the thread_local variable, the main function is transformed by the compiler to something along these lines (reversed from assembly):
register int** ptli = &tli; // cache address of thread_local variable
*ptli = new int(1);
UserLevelContextSwitch();
int li = **ptli;
cout << li;
This seems to be a legal optimization, since the value of volatile tli is not being cached in a register. But the address of the volatile tli is in fact being cached, and not read from memory on part 2.
And that's the problem: after the user-level context switch, the thread that did part 1 goes somewhere else. Part 2 is then picked up by some other thread, which gets the previous stack and registers state. But now the thread that's executing part 2 reads the value of the tli that belongs to thread 1.
I'm trying to figure out a way to prevent the compiler from caching the thread-local variable's address, and volatile doesn't go deep enough. Is there any trick (preferably standard, possibly GCC-specific) to prevent the caching of the thread-local variables' addresses?