When running an algorithm that does not use scheduling and uses scheduling, the performance difference is dramatic - with scheduling, the algorithm finishes in 4 seconds and non in 14 seconds. I thought perf would provide some insight as to why this might be occurring but the stats are very similar.
Is it safe to assume that by handling with dynamic scheduling I have addressed some issue with load balancing? I was hoping to find something in the perf detail. Below is the detail in case helpful. Also code used for Pagerank where scheduling is used...
    # pragma omp parallel for schedule(dynamic, 64)
    for (int u = 0; u < vertex_count; u++) {
        int* in_edge = G.getAdjList(u);
        double sum = 0.0;
        for (int j = 0; j < in_edge_counts[u]; j++) {
            int v = in_edge[j];
            sum += conntrib[v];
        }
        pr_temp[u] = sum * damp + adj;
    }
With the use of scheduling
     107470.977295      task-clock (msec)         #    1.743 CPUs utilized
             1,187      context-switches          #    0.011 K/sec
                44      cpu-migrations            #    0.000 K/sec
         2,279,522      page-faults               #    0.021 M/sec
   255,920,277,205      cycles                    #    2.381 GHz                      (20.00%)
    17,116,048,117      stalled-cycles-frontend   #    6.69% frontend cycles idle     (20.02%)
   153,944,352,418      stalled-cycles-backend    #   60.15% backend cycles idle      (20.02%)
   148,412,677,859      instructions              #    0.58  insn per cycle
                                                  #    1.04  stalled cycles per insn  (30.01%)
    27,479,936,585      branches                  #  255.696 M/sec                    (40.01%)
       321,470,463      branch-misses             #    1.17% of all branches          (50.01%)
    78,562,370,506      L1-dcache-loads           #  731.010 M/sec                    (50.00%)
     2,075,635,902      L1-dcache-load-misses     #    2.64% of all L1-dcache hits    (49.99%)
     3,100,740,665      LLC-loads                 #   28.852 M/sec                    (50.00%)
       964,981,918      LLC-load-misses           #   31.12% of all LL-cache hits     (50.00%)
Without out the use of scheduling
      106872.881349      task-clock (msec)         #    1.421 CPUs utilized
             1,237      context-switches          #    0.012 K/sec
                69      cpu-migrations            #    0.001 K/sec
         2,262,865      page-faults               #    0.021 M/sec
   254,236,425,448      cycles                    #    2.379 GHz                      (20.01%)
    14,384,218,171      stalled-cycles-frontend   #    5.66% frontend cycles idle     (20.04%)
   163,855,226,466      stalled-cycles-backend    #   64.45% backend cycles idle      (20.03%)
   149,318,162,762      instructions              #    0.59  insn per cycle
                                                  #    1.10  stalled cycles per insn  (30.03%)
    27,627,422,078      branches                  #  258.507 M/sec                    (40.03%)
       213,805,935      branch-misses             #    0.77% of all branches          (50.03%)
    78,495,942,802      L1-dcache-loads           #  734.480 M/sec                    (50.00%)
     2,089,837,393      L1-dcache-load-misses     #    2.66% of all L1-dcache hits    (49.99%)
     3,166,900,999      LLC-loads                 #   29.632 M/sec                    (49.98%)
       929,170,535      LLC-load-misses           #   29.34% of all LL-cache hits     (49.98%)
 
     
    