I'll give my two cents, without being 100% sure that what I am about to say is correct. Joe Duffy is a world-class expert in multithreading, but I think that in this implementation he has been overly cautious regarding the cross-thread visibility of the internal state of the LockFreeStack<T> class. The volatility of the Node.m_next field is redundant in my opinion. The Node instances are mutable, but they are only mutated before they are linked in the internal linked list. After that phase they are practically immutable. That mutable phase is performed by a single thread, so there is no chance that another thread may see a stale version of a Node instance.
That leaves only the possibility of instructions re-ordering, as a reason for declaring the Node.m_next as volatile. Since the n.m_next = h; assignement is sandwiched between reading another volatile field (m_head), and an Interlocked.CompareExchange operation, I would assume that a possible re-ordering of the instructions that would compromise the correctness of the implementation is already prevented, but as I said I am not 100% sure.
I am pretty sure though that the implementation of the LockFreeStack<T> class could be improved performance-wise, at the cost of becoming slightly more allocatey, by making immutable the Node class. Nowadays (C# 9) this can be achieved simply by switching from type class to type record. This is how such an implementation would look like:
class LockFreeStack<T>
{
    record Node(T Value, Node Next) { }
    Node _head;
    void Push(T value)
    {
        Node h = Volatile.Read(ref _head);
        while (true)
        {
            Node n = new Node(value, h);
            var original = Interlocked.CompareExchange(ref _head, n, h);
            if (original == h) break;
            h = original;
        }
    }
    T Pop()
    {
        Node h = Volatile.Read(ref _head);
        while (true)
        {
            if (h == null) throw new Exception("Stack empty");
            var original = Interlocked.CompareExchange(ref _head, h.Next, h);
            if (original == h) break;
            h = original;
        }
        return h.Value;
    }
}
Notice that the cost of volatility is incurred only once for each Push or Pop operation. One could even argue that the Volatile.Read of the _head field could be omitted altogether, since a possible stale _head value would be corrected anyway by the first Interlocked.CompareExchange operation, costing only an extra iteration of the while (true) loop.