Lambdas as closures taking environment. The crucial role of RIP register

Question

I've looked at assembler output for the following piece of code and I was stunned:

int x=0, y=0; // global
// r1, r2 are ints, local.
std::thread t([&x, &y, &r1, &r2](){
    x = 1;      
    r1 = y;     
});

!std::thread t([&x, &y, &r1, &r2](){
<lambda()>::operator()(void) const+0: push   %rbp
<lambda()>::operator()(void) const+1: mov    %rsp,%rbp
<lambda()>::operator()(void) const+4: mov    %rdi,-0x8(%rbp)
<lambda()>::operator()(void) const+18: mov    -0x8(%rbp),%rax
<lambda()>::operator()(void) const+22: mov    (%rax),%rax
!   x = 1;      
<lambda()>::operator()(void) const()
<lambda()>::operator()(void) const+8: movl   $0x1,0x205362(%rip)        # 0x6062ac <x>
!   r1 = y;     
<lambda()>::operator()(void) const+25: mov    0x205359(%rip),%edx        # 0x6062b0 <y>
<lambda()>::operator()(void) const+31: mov    %edx,(%rax)
!
!});
<lambda()>::operator()(void) const+33: nop
<lambda()>::operator()(void) const+34: pop    %rbp
<lambda()>::operator()(void) const+35: retq

Why address of x,y is determined relate to RIP. RIP is a instruction pointer so it seems to be wild. Especially, I have never seen something like that. ( Perhaps I haven't seen a lot of stuff :)).

The only explanation that comes to my head is the fact, that lambda is a closure and taking environment variables from particular place has something in common with RIP.

RIP relative addressing was introduced in x86_64. The memory reference is relative to the instruction pointer. It is useful in creating position independent code. You can find a description here: https://www.tortall.net/projects/yasm/manual/html/nasm-effaddr.html — Michael Petch, Nov 28 '16 at 09:42
http://stackoverflow.com/q/18447627/995714, http://stackoverflow.com/q/3250277/995714 — phuclv, Nov 28 '16 at 09:57
PIC is good in 64-bit environment as immediates can still be 32-bit. — Margaret Bloom, Nov 28 '16 at 10:40
@MichaelPetch, I cannot see how does it create position independent code. — Gilgamesz, Nov 28 '16 at 12:47
Because you're indexing relative to the position of the code's location in memory via `RIP`. — David Hoelzer, Nov 28 '16 at 14:36

score 3 · Accepted Answer · answered Nov 29 '16 at 20:00

Code doesn't move at run-time, once the code section is loaded the routine are not copied or moved around.
Static data also occupy the same address once their section is loaded.
So the distance between an instruction and a static variable is known at compile time and it is invariant under relocation of the module base (as both the instruction and the data are translated by the same amount).

So RIP-relative addressing not only is not wild, but it has always been a long time missing feature.
While in 32-bit code an instruction like mov eax, [var] is innocuous, in 64-bit without RIP-relative addressing it requires 9 bytes, 1 for the opcode and 8 for the immediate. With RIP-relative addressing the immediates are still 32 bits.

C++ lamdbas are a syntactic sugar for a function object, where the captured variables become instance variables.
Variables captured by reference are handled as pointer/reference.
Global variables don't need any special treatment when captured as they are already accessible.

You rightfully noted that x and y are accessed respectively as 0x205362(%rip) and 0x205359(%rip).
Since they are global their address is fixed at runtime and RIP-relative addressing is used to access them.

However you forgot to check how r1, a local captured variable, is accessed.
It is stored with a (%rax) and rax was previously loaded as (optimizing) movq (%rdi), %rax.
%rdi is the first parameter of the method operator(), so it is this, the instruction just mentioned loads the first instance variable into rax and then use that value to access r1.
Simply put it is a pointer (or better a reference) to r1, since r1 lives on the stack its address is dynamic at run-time (it depends on the state of the stack).

So lambda use both indirect and RIP-relative addressing, thereby contradicting the hypothesis that RIP-relative addressing was somehow special.

Note that the capturing mechanism doesn't extend the life of capture variables (like in ECMAScript), so capturing a local var by reference in a lambda for std::thread is nearly always a bad idea.

"Note that the capturing mechanism doesn't extend the life of capture variables (like in ECMAScript), so capturing a local var by reference in a lambda for std::thread is nearly always a bad idea." Yes, but I captured locals and my threads are joined before 'out of scope'. @Margaret thanks! :) — Gilgamesz, Nov 29 '16 at 20:37
Even with NASM `DEFAULT ABS` or the equivalent for other assemblers, `mov eax, [var]` doesn't assemble into [`A1 moffs64`](http://www.felixcloutier.com/x86/MOV.html). Absolute `[sign-extended-disp32]` addressing is still available in x86-64, and is what you get from `mov eax, [abs var]`. (x86-32 had two redundant ways to encode it. AMD64 used the shorter one for RIP-relative, leaving the longer one.) I think in NASM you'd have to write `mov eax, [qword var]` to get what GAS calls `movabs`. — Peter Cordes, Nov 30 '16 at 07:07
Anyway, the real point of RIP-relative is that `mov eax, [var]` is not usable at all in 32-bit PIC code. The only option is going through the GOT. That's why AMD made it the best way to address static data. Other than this small nit-pick, nice answer :) — Peter Cordes, Nov 30 '16 at 07:10

Lambdas as closures taking environment. The crucial role of RIP register

1 Answers1