I have many functions that loop over data stored in a 4D array. We are using some OpenMP to help iterate over this data in locations where it makes most sense and we are not overwriting things.
For instance I have the following snippet of code:
void MaximumCombiner::createComplexData(double**** input, const int nX, const int nY,
                                        const int nZ, const int nA, std::complex<double>**** output,
                                        const double* beamData)
{
    for(int iX = 0; iX < nX; ++iX)
    {
        for(int iY = 0 ; iY < nY ; ++iY)
        {
            std::complex<double> complexArg(0.0, (beamData[iY] * M_PI / 180.0));
            std::complex<double> complexExp = std::exp(complexArg);
            for(int iZ = 0; iZ < nZ; ++iZ)
            {
                for(int iA = 0 ; iA < nA ; ++iA)
                {
                    output[iX][iY][iZ][iA] = input[iX][iY][iZ][iA] * complexExp;
                }
            }
        }
    }
}
Originally, I thought I should add a #pragma omp parallel for before each for-loop but now I am wondering if I am spending more time with overhead of creation/deletion of threads than actual work. I also have tried out using #pragma omp parallel above the first for-loop and stuck a #pragma omp for on one of the inner loops, but am not sure if this is best either. What should I look for when trying to decide where to place my OMP calls?
 
    