Given N elements, process only the first (0) and last (N-1) element.
But, if N = 1, only process the single element once.
Using a loop that runs once or twice, as appropriate, lets us avoid duplicating the loop body. If there's a readable way to do this, it has a benefit for source-code size. It may also have advantages for machine-code size, if the loop body is large, and the compiler doesn't end up duplicating it.
I tried incrementing by N-1 but it will not work when N=1 (loops forever). Are there tricks (reverse loop f.i) that will fix this? 
for (i = 0 ; i < N ; i += (N - 1))
Edit:
My original problem concerns three nested loops in x,y,z direction, which is why I couldn't just process elem[0]) and elem[N-1]. Now I have the following
#define forEachLglBound(i_,j_,k_)                                   \
        for(Int i_ = 0;i_ < NPX;i_+=((NPX>1) ? (NPX-1) : 1))        \
            for(Int j_ = 0;j_ < NPY;j_+=((NPY>1) ? (NPY-1) : 1))    \
                for(Int k_ = 0;k_ < NPZ;k_+=((NPZ>1) ? (NPZ-1) : 1))
 
     
     
     
     
    