arr[i]; as a C statement compiles to zero instructions, even in a debug build, since arr isn't volatile. All the asm you're seeing is just locals at fixed locations.
If you're used to looking at AT&T syntax, use the output dropdown on Godbolt to uncheck "Intel Syntax".
Also, you could get useful asm to look at by writing a function that takes a pointer arg and sums it for example, so you can enable light optimization without having the array optimize away to just returning a constant. You could use volatile to force it not to if you really want to see asm that stores array initializers to the stack instead of just taking a function arg. You're writing this to look at the asm, not run it, so don't write a main(). (In general see How to remove "noise" from GCC/clang assembly output?)
int sumarr(int *arr){
register int sum = 0; // register keyword is useful in GCC if you're going to disable optimization
for (int i=0 ; i<1024 ; i++){
sum += arr[i];
}
return sum;
}
Godbolt - gcc 12 with -Og uses an indexed addressing mode like you expect; most other optimization levels do a pointer increment.
# gcc -Og
sumarr:
movl $0, %eax
movl $0, %edx
jmp .L2 # jump to the loop condition, redundant because 0 <= 1023 is known true; could just fall through into the loop, but -Og maybe intentionally preserves that execution? It doesn't in general give consistent debugging.
.L3: # do {
movslq %eax, %rcx # sign-extend int to intptr_t since -Og doesn't optimize much
addl (%rdi,%rcx,4), %edx # edx += *(rdi + rcx*4) = array[i]
addl $1, %eax
.L2: # loop entry point for first iteration
cmpl $1023, %eax
jle .L3 # }while(i<=1023)
movl %edx, %eax # at low optimization levels, GCC didn't sum into the return-value register in the first place.
ret
With a normal level of optimization like -O2, we get a pointer increment:
# GCC -O2 -fno-tree-vectorize (-O2 by itself would vectorize with SIMD)
sumarr:
leaq 4096(%rdi), %rdx # endp = ptr + len
xorl %eax, %eax # sum = 0
.L2: # do {
addl (%rdi), %eax
addq $4, %rdi
cmpq %rdx, %rdi
jne .L2 # }while(p != endp)
ret
The amount of memory reserved for the local variables is always a multiple of 16 bytes, to keep the stack aligned to 16 bytes. in reality it should be 0 but it needs to be a multiple of 16?
That's an over-simplification, but yes in this case 0*16 = 0 is a multiple of 16 which maintains stack alignment for RSP after push rbp. The actual space it uses is in the red zone, 128 bytes below RSP.
The amount of stack space reserved for locals is always a multiple of 8, either odd or even depending on how many pushes it did, so the total movement of RSP is 16*n + 8.
In this case where it doesn't need to preserve any call-preserved registers like RBX, it only pushed RBP (because -fno-omit-frame-pointer is the default at -O0.)
It doesn't sub rsp, 16 because the x86-64 SysV ABI includes a red-zone; the 128 bytes below RSP are safe from signal handlers or anything stepping on them asynchronously, so can be used without "reserving". Use -mno-red-zone to stop GCC doing that, if you want to see it reserve space for an array.