When the C standards were codified, different platforms would do different things when left-shifting negative integers. On some of them, the behavior might trigger implementation-specific traps whose behavior could be outside a program's control, and which could include random code execution. Nonetheless, it was possible that programs written for such platforms might make use of such behavior (a program could e.g. specify that a user would have to do something to configure a system's trap handlers before running it, but the program could then exploit the behavior of the suitably-configured trap handlers).
The authors of the C standard did not want to say that compilers for machines where left-shifting of negative numbers would trap must be modified to prevent such trapping (since programs might potentially be relying upon it), but if left-shifting a negative number is allowed to trigger a trap which could cause any arbitrary behavior (including random code execution) that means that left-shifting a negative number is allowed to do anything whatsoever. Hence Undefined Behavior.
In practice, until about 5 years ago, 99+% of compilers written for a machine that used two's-complement math (meaning 99+% of machines made since 1990) would consistently yield the following behaviors for x<<y and x>>y, to the extent that code reliance upon such behavior was considered no more non-portable than code which assumed char was 8 bits. The C standard didn't mandate such behavior, but any compiler author wanting to be compatible with a wide base of existing code would follow it.
- if
y is a signed type, x << y and x >> y are evaluated as though y was cast to unsigned.
- if
x is type int, x<<y is equivalent to (int)((unsigned)x << y).
- if
x is type int and positive, x>>y equivalent to (unsigned)x >> y. If x is of type int and negative, x>>y is equivalent to ~(~((unsigned)x) >> y).
- If
x is of type long, similar rules apply, but with unsigned long rather than unsigned.
- if
x is an N-bit type and y is greater than N-1, then x >> y and x << y may arbitrarily yield zero, or may act as though the right-hand operand was y % N; they may require extra time proportional to y [note that on a 32-bit machine, if y is negative, that could potentially be a long time, though I only know of one machine which would in practice run more than 256 extra steps]. Compilers were not necessarily consistent in their choice, but would always return one of the indicated values with no other side-effects.
Unfortunately for reasons I can't quite fathom, compiler writers have decided that rather than allowing programmers to indicate what assumptions compilers should use for dead-code removal, compilers should assume that it is impossible to execute any shift whose behavior isn't mandated by the C standard. Thus, given code like the following:
uint32_t shiftleft(uint32_t v, uint8_t n)
{
if (n >= 32)
v=0;
return v<<n;
}
a compiler may determine that because the code would engage in Undefined Behavior when n is 32 or larger, the compiler may assume that the if will never return true, and may thus omit the code. Consequently, unless or until someone comes up with a standard for C which restores the classic behaviors and allows programmers to designate what assumptions merit dead code removal, such constructs cannot be recommended for any code that might be fed to a hyper-modern compiler.