I am trying to generate random numbers with the PCG method. I have tested two differents implementation which are given by block 1 and 2 in the following code. The block 1 is correct and scale as expected with the number of thread. The block 2 does not scale in the right way. I do not understand what is wrong with it.
#include <chrono>
#include <iostream>
#include <omp.h>
#include "../include_random/pcg_basic.hpp"
int main()
{
    
    // /*Bloc 1*/
    // omp_set_num_threads (threadSNumber);
    // startingTime = std::chrono::system_clock::now();
    // #pragma omp parallel
    // {
    //     int threadID = omp_get_thread_num();
    //     pcg32_random_t rng;
    //     pcg32_srandom_r(&rng, time(NULL) ^ (intptr_t)&printf,(intptr_t)&threadID);
    //     // uint32_t bound =1;
    //     #pragma omp for reduction (+:sum)
    //     for (int step = 0; step < N; step++)
    //     {
    //         // sum += 0.5 - (double)pcg32_boundedrand_r(&rng,bound);
    //         sum += 0.5 -((double)pcg32_random_r(&rng)/(double)UINT32_MAX);
    //     }
    // }
    /**Bloc 2**/
    omp_set_num_threads (threadSNumber);
    pcg32_random_t *rng;
    rng = new pcg32_random_t[threadSNumber];
    #pragma omp parallel
    {
        int threadID = omp_get_thread_num();
        pcg32_srandom_r(&rng[threadID], time(NULL) ^ (intptr_t)&printf,(intptr_t)&threadID);
    }
    startingTime = std::chrono::system_clock::now();
    #pragma omp parallel
    {
        int threadID = omp_get_thread_num();
        #pragma omp for reduction (+:sum)
        for (int step = 0; step < N; step++)
        {
            sum += 0.5 -((double)pcg32_random_r(&rng[threadID])/(double)UINT32_MAX);
        }
    }
    delete[] rng;
    /****/
    auto end = std::chrono::system_clock::now();
    auto diff = end - startingTime;
    double total_time = chrono::duration <double, std::ratio<1>> (diff).count();
    cout << "The result of the sum is "<< sum/N << "\n" << endl;
    cout << "# Total time:  "<<  (int)total_time/3600<<"h    "<< ((int)total_time%3600)/60<<"m    "<< (int)total_time%60 << "s        (" << total_time << " s)" << endl;
    return 0;
}
The block 1 scale as expected with the thread number, but the block 2 does not.
# thread    1     2    3    4
block1(s)  3.27 1.64 1.12 0.83
block2(s)  4.60 13.7 8.28 10.9
These examples are minimal examples to reproduce the issue. It is a piece of a bigger function that is in a bigger code.
I want to initialize the seed only once, and every time step I compute a bunch of random number which are used in another function (not doing the sum like this, which is only done here to record something). It is possible to use block 1 but it means that I initialize the seed at each time step instead of doing it once. Moreover, I do not understand the scaling of the block2.
What is wrong in the block 2? Why I get this scaling? There are not using the same rng so I should avoid the data race or I misunderstand something.
