In short, when I have multiple db sections in my .data section, the compiled addresses/labels are off when compiled by NASM. In my testing they are off by 256 bytes in the resulting Mach-O binary.
Software I am using:
- OS X 10.10.5
nasmNASM version 2.11.08, installed via Homebrew as required for x84_64 ASMgobjdumpGNU objdump (GNU Binutils) 2.25.1, installed via HomebrewclangApple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
What works:
Take for example the following "hello world" NASM assembly.
main.s
global _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msg]
mov rdx, len
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
section .data
msg: db "Hello, world!", 10
len: equ $ - msg
Compiled and run with:
/usr/local/bin/nasm -f macho64 -o main.o main.s
clang -o main main.o
./main
This works great, and produces the following output:
Hello, world!
What doesn't:
Now, to add another message, we just need to add another string to the data section, and another syscall. Simple enough.
main.s
global _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msga]
mov rdx, lena
syscall
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msgb]
mov rdx, lenb
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
section .data
msga: db "Hello, world!", 10
lena: equ $ - msga
msgb: db "Break things!", 10
lenb: equ $ - msgb
Compile and run, same as before, and we get:
Break things!
What?!? Shouldn't we be getting?:
Hello, world!
Break things!
What's wrong?:
Something clearly went wrong. Time to disassemble the resulting binary and see what we got.
$ gobjdump -d -M intel main
Produces the following for _main:
0000000100000f7c <_main>:
100000f7c:b8 04 00 00 02 mov eax,0x2000004
100000f81:bf 01 00 00 00 mov edi,0x1
100000f86:48 8d 35 73 01 00 00 lea rsi,[rip+0x173] # 100001100 <msgb+0xf2>
100000f8d:ba 0e 00 00 00 mov edx,0xe
100000f92:0f 05 syscall
100000f94:b8 04 00 00 02 mov eax,0x2000004
100000f99:bf 01 00 00 00 mov edi,0x1
100000f9e:48 8d 35 69 00 00 00 lea rsi,[rip+0x69] # 10000100e <msgb>
100000fa5:ba 0e 00 00 00 mov edx,0xe
100000faa:0f 05 syscall
100000fac:b8 01 00 00 02 mov eax,0x2000001
100000fb1:bf 00 00 00 00 mov edi,0x0
100000fb6:0f 05 syscall
From the comment # 100001100 <msgb+0xf2>, we can see that it is pointing not to the msga symbol, but to 0xf2 past msgb, or 100001100 (at this address there are null bytes, resulting in no output). Inspecting the binary in a hex editor, I find the actual msga string at offset 1000, or address 100001000. The means that the address in the compiled binary is now off by 0x100/256 bytes, simply because there is now a second db label. What?!?
A sorry excuse for a workaround:
As an experiment, I decided to try putting both of the db sections into separate ASM/object files, and linking all 3 together. Doing so works.
main.s
global _main
extern _msga
extern _lena
extern _msgb
extern _lenb
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel _msga]
mov rdx, _lena
syscall
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel _msgb]
mov rdx, _lenb
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
msga.s
global _msga
global _lena
section .data
_msga: db "Hello, world!", 10
_lena: equ $ - _msga
msgb.s
global _msgb
global _lenb
section .data
_msgb: db "Break things!", 10
_lenb: equ $ - _msgb
Compile and run with:
/usr/local/bin/nasm -f macho64 -o main.o main.s
/usr/local/bin/nasm -f macho64 -o msga.o msga.s
/usr/local/bin/nasm -f macho64 -o msgb.o msgb.s
clang -o main msga.o msgb.o main.o
./main
Results in:
Hello, world!
Break things!
While this does work, I find it hard to believe this is the best solution.
What is going wrong?
Surely there must be a way to have multiple db labels in one ASM file? Am I doing something wrong in the way I write the ASM? Is this a bug in NASM? Is this expected behavior somehow, in which case why? My workaround is extra work and clutter, so I would greatly appreciate any assistance in this.