Do I need the volatile qualifier for variables only accessed while a lock is held?  In this code, could removing the volatile qualifier from n possibly change the behavior when concurrent_foo is executed concurrently.
#ifndef __GNUC__
#error __sync_lock builtins are only available with GCC
#endif
volatile int n = 0;
static volatile int lock = 0;
void concurrent_foo () {
    while (__sync_lock_test_and_set (&lock, 1));
    // Non-atomic operation, protected by spinlock above.
    int x = n % 2 + 1;
    n = n + x;
    __sync_lock_release (&lock);
}
I understand that the volatile qualifier instructs the compiler not to optimize memory accesses to a variable.  I also understand that the __sync_lock builtins issue a (full?) memory barrier, which memory accesses should not cross.  However, it would be safe in this example code to fetch n, cache it in a register, compute the new value, and then write it back to n.
Compiling with GCC to i686 source using -O3 reveals that two memory fetches are made, unessesarily:
concurrent_foo:
        movl        $1, %edx
.L2:
        movl        %edx, %eax
        xchgl        lock, %eax
        testl        %eax, %eax
        jne        .L2
        movl        n, %eax
        movl        n, %edx
        movl        %eax, %ecx
        shrl        $31, %ecx
        addl        %ecx, %eax
        andl        $1, %eax
        subl        %ecx, %eax
        leal        1(%edx,%eax), %eax
        movl        %eax, n
        movl        $0, lock
        ret
Without the volatile qualifier I get subtly different code, where n is fetched just once:
concurrent_foo:
        movl        $1, %edx
.L2:
        movl        %edx, %eax
        xchgl        lock, %eax
        testl        %eax, %eax
        jne        .L2
        movl        n, %edx
        movl        %edx, %ecx
        shrl        $31, %ecx
        leal        (%edx,%ecx), %eax
        andl        $1, %eax
        subl        %ecx, %eax
        leal        1(%edx,%eax), %eax
        movl        %eax, n
        movl        $0, lock
        ret
In both circumstances, memory accesses to n occur while the lock is held, and thus should be "correct".  However, I am unsure if I am really guaranteed that. The volatile qualifier is preventing a performance optimization that I would like and would not affect the outcome of the operation (at no point would n be even).
