My requirement is like this: every thread allocates memory itself, then processes it:
typedef struct
{
    ......
}A;
A *p[N];
#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < N; i++) {
        p[i] = (A*)calloc(sizeof(*p[i]), N);
        if (NULL == p[i]) {
            return;
        }
        ......          
    }
}
But the compiler will complain:
error: invalid exit from OpenMP structured block
     return;
So except put the allocating memory code out of the #pragma omp parallel:  
for (int i = 0; i < N; i++) {
    p[i] = (A*)calloc(sizeof(*p[i]), N);
    if (NULL == p[i]) {
        return;
    }       
}
#pragma omp parallel
{
    #pragma omp for
    ......
}
Is there any better method?
 
     
     
    