I was reading this article, and I noticed the jz instruction. This got me thinking:
Would the assembly of this code
for (int i=max;i!=0;--i){
    //Some operation
}
outperform the assembly of this code?
for (int i=0;i<max;++i){
    //Some operation
}
As long as you don't care that your data gets processed with an increasing i, there is no semantic difference. Cache misses shouldn't suffer either, because they can work sequentially either way.
I'm not good enough at assembly to write examples, but I would think that the first example would only use a jz. The second would use a cmp, then a jg, and also require another variable, max. The first example would only need the loop counter, because the 0 is implicit.
This may also be something that compilers use to optimize already, but I can imagine cases where it wouldn't be able to make the optimization.
 
     
     
     
    