If you need this non-zero guarantee for an input to __builtin_clz (which gives an undefined result for an input of zero, otherwise counts leading-zero bits), a useful trick is y = x|1. That doesn't affect the number of leading zeros for any non-zero input, because you'd setting the last position CLZ will look at, only if all the higher bits are zero. If x=1, then x|1 is still 1.
Similarly as setup for __builtin_ctz (count trailing zeros), y = x | (1UL<<31) will set the MSB. Or more portably, x | ( 1ULL << (sizeof(x)*CHAR_BIT-1) ).
Unconditionally setting one of the bits will change some non-zero values; this is only useful in certain cases.
To get exactly the semantics you asked for, y = x ? x : 1 can be expressed that way, or as x + !x as suggested in another answer, which asks the compiler to materialize a boolean integer and add it.
Of course, a smart compiler will hopefully pick a way that's efficient on the target machine. For x86, one might hope that a smart compiler would do something like this if it's ok to overwrite x:
cmp eax, 1 ; CF=1 only if EAX==0 because only 0 is < 1U
adc eax, 0 ; x += (eax < 1U)
While for AArch64, tst or cmp and then cinc can conditionally copy-and-increment a value depending on any flags conditions.
On the Godbolt compiler explorer -
int make_nonzero_booleanize(int x){
return x + !x;
}
int make_nonzero_ternary(int x){
return x ? x : 1;
}
int make_nonzero_if(int x){
// the original. Compilers may compile it to branchless code
// especially gcc -O3 instead of -O2 is more aggressive about if-conversion
int y;
if (x)
y = x;
else
y = 1;
return y;
}
# GCC12 -O3 for x86-64
make_nonzero_booleanize(int): # return x + !x
cmp edi, 1
mov eax, edi
adc eax, 0 # Apparently a GCC dev already thought of this peephole and taught GCC10 about it
# I was expecting compilers to naively do xor-zeroing + test/setcc + add
ret
make_nonzero_ternary(int):
mov eax, edi
test edi, edi
mov edx, 1
cmove eax, edx # standard ternary using conditional-move
ret
make_nonzero_if(int):
# same asm as ternary; GCC did an if-conversion optimization
// ARM64 gcc12 -O2
make_nonzero_booleanize(int):
cmp w0, 0
cinc w0, w0, eq // return x==0 ? x : x+1
ret
make_nonzero_ternary(int):
cmp w0, 0
csinc w0, w0, wzr, ne // return x==0 ? x : 0+1
ret
make_nonzero_if(int):
// identical to ternary, if-conversion even with -O2
Both ways are equally good; the csinc result still has a data dependency on the input w0 even if the condition is true and the result comes from incrementing the zero register (wzr).
# RISC-V 32-bit clang 15 -O3
make_nonzero_booleanize(int):
seqz a1, a0 # tmp = (x==0)
add a0, a0, a1 # return x + tmp
ret
make_nonzero_ternary(int):
mv a1, a0 # tmp = x
li a0, 1 # retval = 1
beqz a1, .LBB1_2 # if (x != 0) {
mv a0, a1 # retval = tmp
.LBB1_2: # }
ret # return retval
make_nonzero_if(int):
# identical to ternary, same branching.
Clang didn't even do a good job with the branching strategy it picked here; it could have done a bnez over a retval=1, avoiding both mv instructions. Unless there's some tuning thing where a branch over an mv is handled by some RISC-V microarchitectures as a conditional-move? Hopefully after inlining it wouldn't be this bad, and might put a branch target for the ==0 special case somewhere else, where it didn't have to jump over it.
So x + !x is better with some real-world compilers on x86-64 and RISC-V, and equal on AArch64.
If you care about this in some specific context with a specific compiler for some ISA, look at how it compiles in your code. (If you know anything about what's efficient in asm or not.)
Older GCC versions:
GCC7 and earlier does better with the ternary, avoiding the mov eax, edi, instead doing mov eax, 1 and then using cmovnz to copy EDI or not. Hopefully the inefficiency in GCC8 and later only happens when not inlining this tiny function; GCC is known to waste instructions when it has to work around hard register constraints from calling conventions or inline asm.
GCC9 and earlier doesn't know the cmp/adc trick for x86-64, instead using a literal implementation of x + !x, which is unfortunately not great because x86 sucks at materializing a boolean condition as a 0/1 integer due to the poor design of setcc.
# GCC9 and earlier -O3
make_nonzero_booleanize(int):
xor eax, eax # zero the full reg
test edi, edi # set FLAGS according to x
sete al # EAX = !x
add eax, edi # EAX += x
ret
To be fair, adc eax, 0 costs 2 uops on Intel before Sandybridge. (And GCC tuning choices maybe didn't realize that adc with immediate 0 was special cased; in general adc costs 2 uops on Intel before Broadwell.)
Branchless might be slower if x == 0 is very rare
Even in the best case, the branchless code for x86-64, AArch64, and RISC-V is lengthening the critical path dependency chain by 2 instructions, both with 1 cycle latency on most CPUs. (For example, csinc can't run until the flags result of cmp is ready.)
But a correctly predicted branch is a control dependency, so it's handled separately by the CPU, optimistically assuming that y=x will be the case, only bailing out if it later detects a mispredict.
But writing an if() in the C source doesn't guarantee you'll get branchy asm. (That or an || makes it more likely to compile branchy than other ways of writing it, though.)
Related: