GCC implements __sync_val_compare_and_swap on PowerPC[64] as:
sync
1: lwarx 9,0,3
cmpw 0,9,4
bne 0,2f
stwcx. 5,0,3
bne 0,1b
2: isync
GCC documents for the __sync_* builtins:
In most cases, these builtins are considered a full barrier. That is, no memory operand will be moved across the operation, either forward or backward. Further, instructions will be issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation.
However the use of isync rather than sync at the end is bothering me. Is this actually a full barrier? Or:
Could loads performed after the
__sync_val_compare_and_swapfail to see stores performed before the store that produced the value__sync_val_compare_and_swaploaded?Could stores performed after the
__sync_val_compare_and_swapbe seen by other threads before they see the value stored by the__sync_val_compare_and_swap?