I have a function like this in C (in pseudo-ish code, dropping the unimportant parts):
int func(int s, int x, int* a, int* r) {
    int i;
    // do some stuff
    for (i=0;i<a_really_big_int;++i) {
        if (s) r[i] = x ^ i;
        else r[i] = x ^ a[i];
        // and maybe a couple other ways of computing r
        // that are equally fast individually
    }
    // do some other stuff
}
This code gets called so much that this loop is actually a speed bottleneck in the code. I am wondering a couple things:
- Since the switch - sis a constant in the function, will good compilers optimize the loop so that the branch isn't slowing things down all the time?
- If not, what is a good way to optimize this code? 
====
Here is an update with a fuller example:
int func(int s,
         int start,int stop,int stride,
         double *x,double *b,
         int *a,int *flips,int *signs,int i_max,
         double *c)
{
  int i,k,st;
  for (k=start; k<stop; k += stride) {
    b[k] = 0;
    for (i=0;i<i_max;++i) {
      /* this is the code in question */
      if (s) st = k^flips[i];
      else st = a[k]^flips[i];
      /* done with code in question */
      b[k] += x[st] * (__builtin_popcount(st & signs[i])%2 ? -c[i] : c[i]);
    }
  }
}
EDIT 2:
In case anyone is curious, I ended up refactoring the code and hoisting the whole inner for loop (with i_max) outside, making the really_big_int loop be much simpler and hopefully easy to vectorize! (and also avoiding doing a bunch of extra logic a zillion times)
 
     
     
     
     
    