0

Look the next example:

#include <stdio.h>
 
int x = 0;

int main()
{
    int y = 0;

    for (int i = 0; i<10; i++){
        y += 3;
        x += 2;
    }
   return 0;
}

The correspondent assembly is:

main:
.LFB0:
  .cfi_startproc
  endbr64
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register 6
  movl  $0, -8(%rbp)
  movl  $0, -4(%rbp)
  jmp .L2
.L3:
  movl  x(%rip), %eax
  addl  $2, %eax
  movl  %eax, x(%rip)
  addl  $3, -8(%rbp)
  addl  $1, -4(%rbp)
.L2:
  cmpl  $9, -4(%rbp)
  jle .L3
  movl  $0, %eax
  popq  %rbp
  .cfi_def_cfa 7, 8
  ret

My question is, why can variable y be changed directly from memory and the global variable x can't?

I am not sure if is because x86 is harvard

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • 5
    x86 is not Harvard. `addl $2, x(%rip)` is legal, it's just that your compiler decided against it. Apparently you did not enable optimization because then the whole loop would have been removed. It's pointless to analyze unoptimized code. – Jester Jun 03 '23 at 19:24
  • Micro-fusion of the load+add doesn't work on Intel CPUs when the operands are immediate + RIP-relative, but separate load/add/store isn't better. That's still 3 front-end uops and 4 back-end uops, just split up into single-uop instructions. It's somewhat interesting that `gcc -O0` would handle different memory destinations differently, but it might just be global vs. local, so you might see the same different with `-m32` where RIP-relative isn't an option. There probably isn't a heuristic in GCC's internals that chooses based on immediate + RIP-relative because of Intel micro-fusion. – Peter Cordes Jun 03 '23 at 20:01
  • Yeah, we see the same difference with absolute addressing modes in 32-bit mode for global vars. https://godbolt.org/z/M18GbM6a7 Other than that, near duplicate of [Why doesn't clang use memory-destination x86 instructions when I compile with optimization disabled? Are they efficient?](https://stackoverflow.com/q/54391268) and [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) – Peter Cordes Jun 03 '23 at 20:09
  • Thanks for your answers, i tried to put asm("addl $2, x(%rip)") inside of for loop, and it worked, but now i have other question, if i have threads and imagine all operations are like this one, do i still need mutexs? because the operation woulb be done in one single instruciton – Leonardo Maia Jun 04 '23 at 02:38
  • 2
    @LeonardoMaia: "Done in a single instruction" doesn't mean it's safe. To guarantee it's safe you'd need atomic operations, like `lock addl $2, x(%rip)` (with the lock prefix). – Brendan Jun 04 '23 at 02:47
  • [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850) - that's what `std::atomic` is for. It'll compile to a `lock add`. – Peter Cordes Jun 04 '23 at 14:33
  • Thank you all, this are topics not so easily found on the internet – Leonardo Maia Jun 04 '23 at 17:35

0 Answers0