I'm writing in assembly using clang 13.1.6 with MacOS Monterey 12.5 on an ARM64 M1 Pro laptop.
If I try to use .dword/.xword in the .text section with the address of a label as its value, my program crashes on startup with a bus error.
Minimal reproducible example:
.text
.balign 4
.global _main
_main:
// accepted method to load from static address
adrp x1, vector@GOTPAGE
ldr x1, [x1, #vector@GOTPAGEOFF]
// now x1 contains the address of vector
ldr x2, [x1]
// now x2 should contain the address of dest
br x2
dest:
mov x0, #0
ret
vector:
.xword dest
This assembles and links without errors or warnings using cc reloc.s -o reloc, but bus errors immediately when run, apparently before even reaching my actual code. The backtrace from lldb is as follows:
* thread #1, stop reason = EXC_BAD_ACCESS (code=2, address=0x100003fb0)
frame #0: 0x000000010001da54 dyld`invocation function for block in dyld4::Loader::applyFixupsGeneric(Diagnostics&, dyld4::RuntimeState&, dyld3::Array<void const*> const&, dyld3::Array<void const*> const&, bool, dyld3::Array<dyld4::Loader::MissingFlatLazySymbol> const&) const + 60
dyld`invocation function for block in dyld4::Loader::applyFixupsGeneric(Diagnostics&, dyld4::RuntimeState&, dyld3::Array<void const*> const&, dyld3::Array<void const*> const&, bool, dyld3::Array<dyld4::Loader::MissingFlatLazySymbol> const&) const:
-> 0x10001da54 <+60>: str x19, [x20]
0x10001da58 <+64>: ldp x29, x30, [sp, #0x20]
0x10001da5c <+68>: ldp x20, x19, [sp, #0x10]
0x10001da60 <+72>: add sp, sp, #0x30
Target 0: (a.out) stopped.
(lldb) bt
* thread #1, stop reason = EXC_BAD_ACCESS (code=2, address=0x100003fb0)
* frame #0: 0x000000010001da54 dyld`invocation function for block in dyld4::Loader::applyFixupsGeneric(Diagnostics&, dyld4::RuntimeState&, dyld3::Array<void const*> const&, dyld3::Array<void const*> const&, bool, dyld3::Array<dyld4::Loader::MissingFlatLazySymbol> const&) const + 60
frame #1: 0x0000000100040fd4 dyld`invocation function for block in dyld3::MachOLoaded::fixupAllChainedFixups(Diagnostics&, dyld_chained_starts_in_image const*, unsigned long, dyld3::Array<void const*>, void (void*, void*) block_pointer) const + 424
frame #2: 0x0000000100041080 dyld`dyld3::MachOLoaded::walkChain(Diagnostics&, dyld3::MachOLoaded::ChainedFixupPointerOnDisk*, unsigned short, bool, unsigned int, void (dyld3::MachOLoaded::ChainedFixupPointerOnDisk*, bool&) block_pointer) const + 104
frame #3: 0x00000001000412b0 dyld`dyld3::MachOLoaded::forEachFixupInSegmentChains(Diagnostics&, dyld_chained_starts_in_segment const*, bool, void (dyld3::MachOLoaded::ChainedFixupPointerOnDisk*, dyld_chained_starts_in_segment const*, bool&) block_pointer) const + 208
frame #4: 0x0000000100040e04 dyld`dyld3::MachOLoaded::forEachFixupInAllChains(Diagnostics&, dyld_chained_starts_in_image const*, bool, void (dyld3::MachOLoaded::ChainedFixupPointerOnDisk*, dyld_chained_starts_in_segment const*, bool&) block_pointer) const + 96
frame #5: 0x0000000100040d98 dyld`dyld3::MachOLoaded::fixupAllChainedFixups(Diagnostics&, dyld_chained_starts_in_image const*, unsigned long, dyld3::Array<void const*>, void (void*, void*) block_pointer) const + 120
frame #6: 0x000000010001da0c dyld`invocation function for block in dyld4::Loader::applyFixupsGeneric(Diagnostics&, dyld4::RuntimeState&, dyld3::Array<void const*> const&, dyld3::Array<void const*> const&, bool, dyld3::Array<dyld4::Loader::MissingFlatLazySymbol> const&) const + 136
frame #7: 0x000000010001d788 dyld`dyld4::Loader::applyFixupsGeneric(Diagnostics&, dyld4::RuntimeState&, dyld3::Array<void const*> const&, dyld3::Array<void const*> const&, bool, dyld3::Array<dyld4::Loader::MissingFlatLazySymbol> const&) const + 204
frame #8: 0x0000000100021574 dyld`dyld4::JustInTimeLoader::applyFixups(Diagnostics&, dyld4::RuntimeState&, dyld4::DyldCacheDataConstLazyScopedWriter&, bool) const + 604
frame #9: 0x000000010000d904 dyld`dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*) + 1928
frame #10: 0x000000010000d06c dyld`start + 488
The crash appears to be inside the dynamic linker, not in my code at all.
An even simpler example with the same behavior is:
.text
.balign 4
.global _main
_main:
ldr x1, =dest
br x1
dest:
mov x0, #0
ret
Here ldr x1, =dest is supposed to similarly assemble the address of dest into the literal pool (a nearby location within the .text section) and load from there into x1. This also bus errors.
Equivalent code works fine on ARM64 Linux.
Why is this, and how do I fix it?