Answer to your addition.
Many tutorial talk about the visibility of volatile field, just like
"volatile field becomes visible to all readers (other threads in
particular) after a write operation completes on it". I have doubt
about how could a completed write on field being invisible to other
Threads(or CPUS)?
The compiler might mess up code.
e.g.
boolean stop;
void run(){
while(!stop)println();
}
first optimization
void run(){
boolean r1=stop;
while(!r1)println();
}
second optimization
void run(){
boolean r1=stop;
if(!r1)return;
while(true) println();
}
So now it is obvious this loop will never stop because effectively the new value to stop will never been seen. For store you can do something similar that could indefinitely postpone it.
As my understanding, a completed write means you have successfully
written the filed back to cache, and according to the MESI, all others
thread should have an Invalid cache line if this filed have been
cached by them.
Correct. This is normally called 'globally visible' or 'globally performed'.
One exception ( Since I am not very familiar with the hardcore, this
is just a conjecture )is that maybe the result will be written back to
the register instead of cache and I do not know whether there is some
protocol to keep consistency in this situation or the volatile make it
not to write to register in java.
All modern processors are load/store architectures (even X86 after uops conversion) meaning that there are explicit load and store instructions that transfer data between registers and memory and regular instructions like add/sub can only work with registers. So a register needs to be used anyway. The key part is that the compiler should respect the loads/stores of the source code and limit optimizations.
suppose the compiler did not reorder it, what makes we see in thread2
is due to the store buffer, and I do not think a write operation in
store buffer means a completed write.
Since the store buffer and invalidate queue strategy, which make the
write on variable A looks like invisible but in fact the write
operation has not finished while thread2 read A.
On the X86 the order of the stores in the store buffer are consistent with program order and will commit to the cache in program order. But there are architectures where stores from the store buffer can commit to the cache out of order e.g. due to:
Store buffers can be a source of reordering; but also out of order and speculative execution can be a source.
Apart from the stores, reordering loads can also lead to observing stores out of order. On the X86 loads can't be reordered, but on the ARM it is allowed. And of course the JIT can mess things up as well.
Even we make field B volatile, while we set a write operation on field
B to the store buffer with memory barriers, thread 2 can read the b
value with 0 and finish.
It is important to realize that the JMM is based on sequential consistency; so even though it is a relaxed memory model (separation of plain loads and stores vs synchronization actions like volatile load/store lock/unlock) if a program has no data races, it will only produce sequential consistent executions. For sequential consistency the real time order doesn't need to be respected. So it is perfectly fine for a load/store to be skewed as long as:
there memory order is a total order over all loads/stores
the memory order is consistent with the program order
a load sees the most recent write before it in the memory order.
As for me, the volatile looks like is not about the visibility of the
filed it declared, but more like an edge to make sure that all the
writes happens before volatile field write in ThreadA is visible to
all operations after volatile field read( volatile read happens after
volatile field write in ThreadA has completed ) in another ThreadB.
You are on the right path.
Example.
int a=0
volatile int b=;
thread1(){
1:a=1
2:b=1
}
thread2(){
3:r1=b
4:r2=a
}
In this case there is a happens before edge between 1-2 (program order). If r1=1, then there is happens before edge between 2-3 (volatile variable) and a happens before edge between 3-4 (program order).
Because the happens before relation is transitive, there is a happens before edge between 1-4. So r2 must be 1.
volatile takes care of the following:
Visibility: needs to make sure the load/store doesn't get optimized out.
That is load/store is atomic. So a load/store should not be seen partially.
And most importantly, it needs to make sure that the order between 1-2 and 3-4 is preserved.
By the way, since I am not an native speakers, I have seen may
tutorials with my mother language(also some English tutorials) say
that volatile will instruct JVM threads to read the value of volatile
variable from main memory and do not cache it locally, and I do not
think that is true.
You are completely right. This is a very common misconception. Caches are the source of truth since they are always coherent. If every write needs to go to main memory, programs would become extremely slow. Memory is just a spill bucket for whatever doesn't fit in cache and can be completely incoherent with the cache. Plain/volatile loads/stores are stored in the cache. It is possible to bypass the cache for special situations like MMIO or when using e.g. SIMD instructions but it isn't relevant for these examples.
Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.
Most people here are not a native speaker (I'm certainly not). Your English is good enough and you show a lot of promise.