Volatile isn't good enough, but practically it will always work because the operating system scheduler will always take a lock eventually.  And will work well on a core with a strong memory model, like x86 which burns a lot of juice to keep caches synchronized between cores.
So what really only matters is how quickly a thread will respond to the stop request.  It is easy to measure, just start a Stopwatch in the control thread and record the time after the while loop in the worker thread.  The results I measured from repeating taking 1000 samples and taking the average, repeated 10 times:
volatile bool, x86:         550 nanoseconds
volatile bool, x64:         550 nanoseconds
ManualResetEvent, x86:     2270 nanoseconds
ManualResetEvent, x64:     2250 nanoseconds
AutoResetEvent, x86:       2500 nanoseconds
AutoResetEvent, x64:       2100 nanoseconds
ManualResetEventSlim, x86:  650 nanoseconds
ManualResetEventSlim, x64:  630 nanoseconds
Beware that the results for volatile bool are very unlikely to look that well on a processor with a weak memory model, like ARM or Itanium.  I don't have one to test.
Clearly it looks like you want to favor ManualResetEventSlim, giving good perf and a guarantee.
One note with these results, they were measured with the worker thread running a hot loop, constantly testing the stop condition and not doing any other work.  That's not exactly a good match with real code, a thread won't typically check the stop condition that often.  Which makes the differences between these techniques largely inconsequential.