Just to be completely clear: the CALL instruction pushes the address of the instruction following it onto the stack and jumps to the target address. This means that
x: call start
y:
is morally equivalent to (ignoring that we trash %rax here):
x: lea y(%rip), %rax
push %rax
jmp start
y:
Conversely RET pops an address from the stack and jumps to it.
Now in your code you do popq %rsi and then later ret jumps back to whatever called you. If you just change the popq to lea str(%rip), %rsi to load %rsi with the address of str you still have the return value (address of str) on the stack! To fix your code simply manually pop the return value off the stack (add $8, %rsp) OR more sanely move str to after the function so you don't need the awkward call.
Updated with complete stand alone example:
# p.s
#
# Compile using:
# gcc -c -fPIC -o p.o p.s
# gcc -fPIC -nostdlib -o p -Wl,-estart p.o
.text
.global start # So we can use it as an entry point
start:
movq $1, %rax #sys_write
movq $1, %rdi
lea str(%rip), %rsi
movq $5, %rdx
syscall
mov $60, %rax #sys_exit
mov $0, %rdi
syscall
.data
str:
.string "test\n"
Disassembling the code with objdump -d p reveals that the code is indeed position independent, even when using .data.
p: file format elf64-x86-64
Disassembly of section .text:
000000000040010c <start>:
40010c: 48 c7 c0 01 00 00 00 mov $0x1,%rax
400113: 48 c7 c7 01 00 00 00 mov $0x1,%rdi
40011a: 48 8d 35 1b 00 20 00 lea 0x20001b(%rip),%rsi # 60013c <str>
400121: 48 c7 c2 05 00 00 00 mov $0x5,%rdx
400128: 0f 05 syscall
40012a: 48 c7 c0 3c 00 00 00 mov $0x3c,%rax
400131: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
400138: 0f 05 syscall