return *(uint32_t*)(m_head_memory_location + offset);
You cast to non-atomic non-volatile uint32_t* and dereference!!!
The compiler is allowed to assume that this uint32_t object isn't written by anything else (i.e. assume no data-race UB), so it can and will hoist the load out of the loop, effectively transforming it into something like if((val=load) == 0) infinite_loop();.
A GCC memory barrier will force a reload, but this is an implementation detail for std::atomic_thread_fence(std::memory_order_acquire).  For x86, that barrier only needs to block compile-time reordering, so a typical implementation for GCC might be asm("" ::: "memory").
It's not the acquire ordering that's doing anything, it's the memory clobber that stops GCC from assuming another read will read the same thing.  That's not something ISO C++ std::atomic_thread_fence(std::memory_order_acquire) implies for non-atomic variables.  (And it's always implied for atomic and volatile).  So like I said, this would work in GCC but only as an implementation detail.
It's also strict-aliasing UB if this memory is ever accessed with other types than this an char*, or if the underlying memory was declared as a char[] array.  If you got a char* from mmap or something then you're fine.
It's also possible misalignment UB unless offset is known to be a multiple of 4.  (Although unless GCC chooses to auto-vectorize, this won't bite you in practice on x86.)
You can solve these two for GNU C with typedef uint32_t unaligned_u32 __attribute((may_alias, aligned(1))); but you still need volatile or atomic<T> for reading in a loop to work.
In general
Use std::atomic_thread_fence(std::memory_order_acquire); as required by the C++ memory model; that's what governs reordering at compile time.
When compiling for x86, it won't turn into any asm instructions; in asm it's a no-op.  But if you don't tell the compiler it can't reorder something, your code might break depending on compiler optimization level.
You might get lucky and have the compiler do a non-atomic load after an atomic mo_relaxed load, or it might do the non-atomic load earlier if you don't tell it not to.