I also come up with another solution that based on SeqLock.
After knowing that what I tried to achieve is essentially tear-detection, I rewrite it using a SeqLock template. I still define my three variables a, b, c as _Atomic uint32_t since I also want to modify them in thread_low_priority using atomic_fetch_*.
On ARMv7-M archiecture RMW atomic operations are implement using ldrex/strex. The compiler will issue a loop to check whether strex success or not. In my case, it could be a problem when using RMW operations because thread_high_priority needs to be fast and run uninterruptedly. I currently don't know if there is a case where strex always failed in the thread_high_priority context that could cause deadlock.
_Atomic uint32_t a, b, c;
atomic_uint seqcount = 0;
void thread_high_priority(void)
{
  uint32_t _a, _b, _c;
  
  uint orig_cnt = atomic_load_explicit(&seqcount, memory_order_relaxed);
  atomic_store_explicit(&seqcount, orig_cnt + 1, memory_order_relaxed);
  atomic_thread_fence(memory_order_release);
  _a = atomic_load_explicit(&a, memory_order_relaxed);
  _b = atomic_load_explicit(&b, memory_order_relaxed);
  _c = atomic_load_explicit(&c, memory_order_relaxed);
  atomic_store_explicit(&a, _a - 1, memory_order_relaxed);
  atomic_store_explicit(&b, _b + 1, memory_order_relaxed);
  atomic_store_explicit(&c, _c - 1, memory_order_relaxed);
  atomic_store_explicit(&seqcount, orig_cnt + 2, memory_order_release);
}
void thread_low_priority(void)
{
  uint32_t _a, _b, _c;
  
  uint c0, c1;
  do {
    c0 = atomic_load_explicit(&seqcount, memory_order_acquire);
    _a = atomic_load_explicit(&a, memory_order_relaxed);
    _b = atomic_load_explicit(&b, memory_order_relaxed);
    _c = atomic_load_explicit(&c, memory_order_relaxed);
    c1 = atomic_load_explicit(&seqcount, memory_order_acquire);
  } while (c0 & 1 || c0 != c1);
}
Edit: Again after checking the output from compiler, I slightly modify my code in thread_high_priority. Compile using ARM gcc 10.3.1 (2021.10 none) with compilation flag -O1 -mcpu=cortex-m3 -std=gnu18 -mthumb.
In my original code, dmb ish is issued before the store as shown below.
atomic_store_explicit(&seqcount, orig_cnt + 1, memory_order_release);
--->
        adds    r1, r2, #1
        dmb     ish
        str     r1, [r3]
After I separate the memory barrier from store, dmb ish is issued after store, so that the update of seqcount is visible before updating a, b, c.
atomic_store_explicit(&seqcount, orig_cnt + 1, memory_order_relaxed);
atomic_thread_fence(memory_order_release);
-->
        adds    r1, r2, #1
        str     r1, [r3]
        dmb     ish