I would like to have a general understanding of when I can expect a compiler to vectorize a loop and when it is worth for me to unroll the loop to help it decides to use vectorization.
I understand the details are very important (what compiler, what compilation options, what architecture, how do I write the code in the loop, etc), but I wonder if there are some general guidelines for modern compilers.
I will be more specific giving an example with a simple loop (the code is not supposed to compute anything useful):
    double *A,*B; // two arrays
    int delay = something
    [...]
    double numer = 0, denomB = 0, denomA = 0;
    for (int idxA = 0; idxA < Asize; idxA++)
    {
        int idxB = idxA + (Bsize-Asize)/2 + delay;
        numer  += A[idxA] * B[idxB];
        denomA += A[idxA] * A[idxA];
        denomB += B[idxB] * B[idxB];
    }
Can I expect a compiler to vectorize the loop or would it be useful to rewrite the code like the following?
    for ( int idxA = 0; idxA < Asize; idxA+=4 )
    {
        int idxB = idxA + (Bsize-Asize)/2 + delay;
        numer  += A[idxA] * B[idxB];
        denomA += A[idxA] * A[idxA];
        denomB += B[idxB] * B[idxB];
        numer  += A[idxA+1] * B[idxB+1];
        denomA += A[idxA+1] * A[idxA+1];
        denomB += B[idxB+1] * B[idxB+1];
        numer  += A[idxA+2] * B[idxB+2];
        denomA += A[idxA+2] * A[idxA+2];
        denomB += B[idxB+2] * B[idxB+2];
        numer  += A[idxA+3] * B[idxB+3];
        denomA += A[idxA+3] * A[idxA+3];
        denomB += B[idxB+3] * B[idxB+3];
    }
 
    