I have N threads performing various task and these threads must be regularly synchronized with a thread barrier as illustrated below with 3 thread and 8 tasks. The || indicates the temporal barrier, all threads have to wait until the completion of 8 tasks before starting again.
Thread#1  |----task1--|---task6---|---wait-----||-taskB--|          ...
Thread#2  |--task2--|---task5--|-------taskE---||----taskA--|       ...
Thread#3  |-task3-|---task4--|-taskG--|--wait--||-taskC-|---taskD   ...
I couldn’t find a workable solution, thought the little book of Semaphores http://greenteapress.com/semaphores/index.html was inspiring. I came up with a solution using std::atomic shown below which “seems” to be working using three std::atomic. I am worried about my code breaking down on corner cases hence the quoted verb. So can you share advise on verification of such code? Do you have a simpler fool proof code available?
std::atomic<int> barrier1(0);
std::atomic<int> barrier2(0);
std::atomic<int> barrier3(0);
void my_thread()
{
  while(1) {
    // pop task from queue
    ...
    // and execute task 
    switch(task.id()) {
      case TaskID::Barrier:
        barrier2.store(0);
        barrier1++;
        while (barrier1.load() != NUM_THREAD) {
          std::this_thread::yield();
        }
        barrier3.store(0);
        barrier2++;
        while (barrier2.load() != NUM_THREAD) {
          std::this_thread::yield();
        }
        barrier1.store(0);
        barrier3++;
        while (barrier3.load() != NUM_THREAD) {
          std::this_thread::yield();
        }
       break;
     case TaskID::Task1:
       ...
     }
   }
}
 
     
     
    