Assume I have a function f(i) which depends on an index i (among other values which cannot be precomputed). 
I want to fill an array a so that a[n] = sum(f(i)) from i=0 to n-1.
Edit: After a comment by Hristo Iliev I realized what I am doing is a cumulative/prefix sum.
This can be written in code as
float sum = 0;
for(int i=0; i<N; i++) {
    sum += f(i);
    a[i] = sum;
}
Now I want to use OpenMP to do this in parallel.  One way I could do this with OpenMP is to write out the values for f(i) in parallel and then take care of the dependency in serial.  If f(i) is a slow function then this could work well since the non-paralleled loop is simple.
#pragma omp parallel for
for(int i=0; i<N; i++) {
    a[i] = f(i);
}
for(int i=1; i<N; i++) {
    a[i] += a[i-1];
}
But it's possible to do this without the non-parallel loop with OpenMP. The solution, however, that I have come up with is complicated and perhaps hackish. So my question is if there is a simpler less convoluted way to do this with OpenMP?
The code below basically runs the first code I listed for each thread.  The result is that values of a in a given thread are correct up to a constant.  I save the sum for each thread to an array suma with nthreads+1 elements.  This allows me to communicate between threads and determine the constant offset for each thread.  Then I correct the values of a[i] with the offset.
float *suma;
#pragma omp parallel
{
    const int ithread = omp_get_thread_num();
    const int nthreads = omp_get_num_threads();
    const int start = ithread*N/nthreads;
    const int finish = (ithread+1)*N/nthreads;
    #pragma omp single
    {
        suma = new float[nthreads+1];
        suma[0] = 0;
    }
    float sum = 0;
    for (int i=start; i<finish; i++) {
        sum += f(i);
        a[i] = sum;
    }
    suma[ithread+1] = sum;
    #pragma omp barrier
    float offset = 0;
    for(int i=0; i<(ithread+1); i++) {
        offset += suma[i];
    }
    for(int i=start; i<finish; i++) {
        a[i] += offset;
    }
}
delete[] suma;
A simple test is just to set f(i) = i.  Then the solution is a[i] = i*(i+1)/2 (and at infinity it's -1/12).
 
     
    