I was looking further into pthread barriers following the pthread tutorial at pthread Tutorial - Peter Chapin, 3.2 Barriers pg 11 going through the use of two barriers in the thread function, the first suspends all threads until all have reached the loop_barrier confirmed by the arbitrary thread elected to do any serial cleanup following the barrier return of PTHREAD_BARRIER_SERIAL_THREAD and a subsequent barrier in the thread function of prep_barrier which ensures all threads are suspended until the serial cleanup is done.
My understanding being that this allows threads to run continually while providing thread synchronization at a given pointer in the processing where all work on a per-cycle basis is completed before all threads continue running in a concurrent manner. The example simply shows what occurs for one such cycle and then a done flag is set and the thread function returns.
All threads do suspend and wait on the loop_barrier and prep_barrier, but the problem is that following thread function return the program stalls on the first pthread_join() which gdb explaiins, rather unhelpfully, is the result of "in pthread_barrier_destroy () from /lib64/libpthread.so.0"
The tutorial provides only a framework for the thread function and main program and I simply provided the minimum to complete it, declaring a struct to hold the different loop-limits for the for loop in each thread function and members to hold the thread index and sum of the for loop variable values. Apparently I didn't understand the barriers quite a completely as I thought I did. The code causing the hang-on-join is:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>
#define handle_error_en(en, msg) \
  do { errno = en; perror(msg); exit(EXIT_FAILURE); } while (0)
#define handle_error(msg) \
  do { perror(msg); exit(EXIT_FAILURE); } while (0)
#define NCPU 4
#define ITER_PER_CPU  100
typedef struct {
  int index, start, end;
  unsigned sum;
} loop_data;
pthread_barrier_t loop_barrier;   /* global barriers (could pass in data) */
pthread_barrier_t prep_barrier;
void *thread_fn (void *data)
{
  int done = 0, 
      i = 0;
  loop_data *thread_data = data;
  
  do {
    for (i = thread_data->start; i < thread_data->end; i++) {
      /* each arg gets separate loop_data - do work */
      thread_data->sum += i;
    }
    
    /* suspend on barrier and do any per-cycle cleanup */
    if (pthread_barrier_wait (&loop_barrier) == PTHREAD_BARRIER_SERIAL_THREAD) {
      puts ("PTHREAD_BARRIER_SERIAL_THREAD");
      /* no actual per-cycle cleanup, just set done flag */
      done = 1;
    }
    /* suspend on barrier until per-cycle cleanup complete */
    pthread_barrier_wait (&prep_barrier);
    
    printf ("thread index: %d, sum: %d\n", 
            thread_data->index, thread_data->sum);
    
  } while (!done);
  
  return data;
}
int main (void) {
  pthread_t id[NCPU];
  pthread_attr_t attr;
  loop_data arr[NCPU] = {{ .start = 0 }};
  void *res;
  int rtn = 0;
  
  /* initialize barriers and validate */
  if ((rtn = pthread_barrier_init (&loop_barrier, NULL, NCPU))) {
    handle_error_en (rtn, "pthread_barrier_init-loop_barrier");
  }
  if ((rtn = pthread_barrier_init (&prep_barrier, NULL, NCPU))) {
    handle_error_en (rtn, "pthread_barrier_init-prep_barrier");
  }
  
  /* initialize thread attributes (using defaults) and validate */
  if ((rtn = pthread_attr_init (&attr))) {
    handle_error_en (rtn, "pthread_attr_init");
  }
  
  /* set data index, start, end and create/validate each thread */ 
  for (int i = 0; i < NCPU; i++) {
    /* initialize index, start / end values */
    arr[i].index = i;
    arr[i].start = i * ITER_PER_CPU;
    arr[i].end = (i + 1) * ITER_PER_CPU;
    printf ("id: %d, start: %3d, end: %3d\n", i, arr[i].start, arr[i].end);
    /* create thread and validate */
    if ((rtn = pthread_create (&id[i], &attr, thread_fn, &arr[i]))) {
      handle_error_en (rtn, "pthread_create");
    }
  }
  /* join all threads and compare sums from threads with sums in main */
  for (int i = 0; i < NCPU; i++) {
    loop_data *data = NULL;
    /* join and validate */
    printf ("joining thread index: %d\n", i);
    if ((rtn = pthread_join (id[i], &res))) {
      fprintf (stderr, "error: thread %d\n", i);
      handle_error_en (rtn, "pthread_join");
    }
    data = res;   /* pointer to return struct provided through parameter */
    printf ("thread index: %d joined\n", data->index);
  }
  
  /* destroy barriers and validate */
  if ((rtn = pthread_barrier_destroy (&loop_barrier))) {
    handle_error_en (rtn, "pthread_barrier_destroy-loop_barrier");
  }
  if ((rtn = pthread_barrier_destroy (&prep_barrier))) {
    handle_error_en (rtn, "pthread_barrier_destroy-prep_barrier");
  }
}
Example Use/Output
$ ./bin/pthread-vtctut-04
id: 0, start:   0, end: 100
id: 1, start: 100, end: 200
id: 2, start: 200, end: 300
id: 3, start: 300, end: 400
joining thread index: 0
PTHREAD_BARRIER_SERIAL_THREAD
thread index: 2, sum: 24950
thread index: 0, sum: 4950
thread index: 1, sum: 14950
thread index: 3, sum: 34950
^C
The manual interrupt provided where the code hangs on line 94 at if ((rtn = pthread_join (id[i], &res))) {. So why since each thread function is released by the second barrier (as indicated by the "thread index: x, sum: yyyy" output does the code hang on pthread_join() in main()?
 
     
    