I have the following ostensibly simple C program:
#include <stdint.h>
#include <stdio.h>
uint16_t
foo(uint16_t *arr)
{
  unsigned int i;
  uint16_t sum = 0;
  for (i = 0; i < 4; i++) {
    sum += *arr;
    arr++;
  }
  return sum;
}
int main()
{
  uint32_t arr[] = {5, 6, 7, 8};
  printf("sum: %x\n", foo((uint16_t*)arr));
  return 0;
}
The idea being that we iterate over an array and add up it's 16-bit words ignoring overflow. When compiling this code on x86-64 with gcc and no optimization I get what would seem to be the correct result of 0xb (11) because it's summing the first 4 16-bit words which include 5, and 6:
$ gcc -O0 -o castit castit.c
$ ./castit
sum: b
$ ./castit
sum: b
$
With optimization on it's another story:
$ gcc -O2 -o castit castit.c
$ ./castit
sum: 5577
$ ./castit
sum: c576
$ ./castit
sum: 1de6
The program generates indeterminate values for the sum.
I'm assuming the position that it's not a compiler bug for now, which would lead me to believe that there is some undefined behavior in the program, however I can't point to a specific thing which would lead to it.
Note that when the function foo is compiled to a separately linked module the issue is not seen.
 
     
     
    