The following C program calculate the value of PI using multiple threads.
#include <stdio.h>
#define N 10000000
int main(int argc, char **argv)
{
    long int i, n = N;
    double x, dx, f, sum, pi;
    printf("number of intervals: %ld\n", n);
    sum = 0.0;
    dx = 1.0/(double)n;
    #pragma omp parallel for private(x,f,i) shared(dx, sum,n)
    for (i = 1; i<=n; i++){
        x = dx*((double)(i-0.5));
        f = 4.0/(1.0+x*x);
        #pragma omp critical
        sum+=f;
    }
    pi = dx*sum;
    printf("PI %.24f\n", pi);
    return 0;
}
As far as I can see, the only shared variable on which race condition can occur is "sum", which in fact is executed using the Critical clause. However, every time I get different results:
number of intervals: 10000000
PI 3.141592653589736272579103
number of intervals: 10000000
PI 3.141592653589804218228210
number of intervals: 10000000
PI 3.141592653589829975402381
If run the same code but using Reduction instead of Critical I get every time the same result:
number of intervals: 10000000
PI 3.141592653589669659197625
What am I doing wrong?
