Sometimes a loop where the CPU spends most of the time has some branch prediction miss (misprediction) very often (near .5 probability.) I've seen a few techniques on very isolated threads but never a list. The ones I know already fix situations where the condition can be turned to a bool and that 0/1 is used in some way to change. Are there other conditional branches that can be avoided?
e.g. (pseudocode)
loop () {
  if (in[i] < C )
    out[o++] = in[i++]
  ...
}
Can be rewritten, arguably losing some readability, with something like this:
loop() {
  out[o] = in[i]  // copy anyway, just don't increment
  inc = in[i] < C  // increment counters? (0 or 1)
  o += inc
  i += inc
}
Also I've seen techniques in the wild changing && to & in the conditional in certain contexts escaping my mind right now. I'm a rookie at this level of optimization but it sure feels like there's got to be more.
 
     
     
     
     
     
     
     
     
    