I've tried to reduce the code to something more minimal to demonstrate the problem.
BITS 64
global _start:function      
global BIG_BAD_BLOCK:data   
section .rodata     progbits  alloc   noexec  nowrite  align=4 
    hc_str_a: db "Example",   0x0
section .bss        nobits    alloc   noexec  write    align=4 
    personZ:  resb  20  ;
    personX:  resb  20  ;
section FAKE_HEAP   nobits    alloc   noexec  write    align=1
    NEXT_ADDR:      resq 1      ; pointer to the next available byte within the block
    BIG_BAD_BLOCK:  resb 204800 ; 200 KB chunk of memory
;   cpu instructions
section .text       progbits  alloc   exec    nowrite  align=16
    _start:                              ; start(argc, argv, envp)                               // the kernel calls _start() with the args provided by the execve() system call
        mov rdi, rsp                     ; (int*)          rdi         = rsp                     // argc is the first thing on the stack
        add rdi, 8                       ; (char**)        rdi         = (unsigned long) rdi + 8 // argv begins 8 bytes after argc
        mov ecx, dword [rsp]             ; (unsigned int)  ecx         = *((int*) rsp)           // argc : the kernel passes initial program arguments on the stack, rather than by registers
        mov eax, ecx                     ; (unsigned int)  eax         = ecx
        mov ebx, 8                       ; (unsigned int)  ebx         = 8
        mul ebx                          ; (unsigned int)  eax         = argc * 8                // how many bytes long is the argv array
        add eax, 8                       ; (unsigned int)  eax        += 8                       // byte length of argv + 8 byte offset for argv's trailing null
        add rax, rdi                     ; (char**)        rax         = (unsigned long) argv + eax
        mov rsi, rax                     ; (char**)        rsi         = envp                    //
        mov eax, ecx                     ; (unsigned int)  eax         = argc                    // 
        call init                        ; init(argc, argv, envp)     
        nop                              ;                                                       // ignore the do-nothing instruction
    init:
        push rax                         ;                                                       // save the register we are going to clobber (argc)
        push rdi                         ;                                                       // save the register we are going to clobber (argv)
        push rsi                         ;                                                       // save the register we are going to clobber (envp)
        push rbp                         ; (stackframe*)  (--rsp)      = (stackframe*) rbp       // save copy of old top-of-stack at the new top-of-stack 8 bytes down
        mov rbp, rsp                     ; (stackframe*)   rbp         = rsp                     // (this provides us a fixed pointer to the old top-of-stack)
        call init_heap                   ; init_heap()                                           // let's ignore this for now
        mov rax, qword [rbp - 24]        ; (unsigned int)  eax         = argc
        mov rdi, qword [rbp - 16]        ; (char**)        rdi         = argv
        mov rsi, qword [rbp -  8]        ; (char**)        rsi         = envp
        call main                        ; (unsigned int)  eax         = main(argc, argv, envp)
        call exit                        ; exit(eax)
    main:
        nop
        mov eax, 0                       ; (unsigned int)  eax         = 0
        ret                              ; return 0;
    init_heap:
        mov qword [NEXT_ADDR], BIG_BAD_BLOCK    ; NEXT_ADDR will start by pointing to the first byte of BIG_BAD_BLOCK
    malloc: ; malloc(byteCount)
        push qword [NEXT_ADDR]
        add qword [NEXT_ADDR], rax      ; NEXT_ADDR += byteCount
        pop rax                         ; rax = (void*) memoryChunk
        ret
    exit:               ; exit(statusCode)
        mov rdi, rax    ; rdi = (int) statusCode
        mov rax, 60     ; rax = (unsigned long int) 60   // system call #60 is SYS_exit
        syscall         ; SYS_exit(statusCode)           // tell kernel to kill this process
;   To assemble:
;   nasm -felf64 -gdwarf -o HeapProblem.o ./HeapProblem.asm
;   To link:
;   ld -o HeapProblem.bin HeapProblem.o
I assemble and link using the commands above. This is just 1 single assembly file. No includes. No macros. No libraries. Not even libC. Its just that 1 file you see there, assembled using the Netwide Assembler, and linked using the ld core utility. With only that 1 object file being processed by the linker.
This implies:
- a traditional malloc is not being loaded.
- No system calls to brkare being made.
- No system calls to mmapare being made.
There is nothing happening except what you see in that 1 assembly file. No other code should be interacting with this binary in any way, except for the linux system kernel itself which will load the binary into memory when a shell invokes execve() on the path to the bin file
After assembling and linking the file, we go to execute/debug it.
gdb HeapProblem.bin
b *_start
run one simple test
info proc mappings
process 5423
Mapped address spaces:
          Start Addr           End Addr       Size     Offset objfile
            0x400000           0x401000     0x1000        0x0 /var/www/html/ASM/HeapProblem.bin
            0x600000           0x601000     0x1000        0x0 /var/www/html/ASM/HeapProblem.bin
            0x601000           0x633000    0x32000        0x0 [heap]
      0x7ffff7ffb000     0x7ffff7ffd000     0x2000        0x0 [vvar]
      0x7ffff7ffd000     0x7ffff7fff000     0x2000        0x0 [vdso]
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
maintenance info sections
Exec file:
    `/var/www/html/ASM/HeapProblem.bin', file type elf64-x86-64.
 [0]     0x004000b0->0x00400124 at 0x000000b0: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
 [1]     0x00400124->0x0040012c at 0x00000124: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
 [2]     0x0060012c->0x00600158 at 0x0000012c: .bss ALLOC
 [3]     0x00600158->0x00632160 at 0x0000012c: FAKE_HEAP ALLOC
 [4]     0x00000000->0x00000030 at 0x0000012c: .debug_aranges READONLY HAS_CONTENTS
 [5]     0x00000000->0x00000053 at 0x0000015c: .debug_info READONLY HAS_CONTENTS
 [6]     0x00000000->0x0000001b at 0x000001af: .debug_abbrev READONLY HAS_CONTENTS
 [7]     0x00000000->0x00000066 at 0x000001ca: .debug_line READONLY HAS_CONTENTS
print & BIG_BAD_BLOCK
$1 = (<data variable, no debug info> *) 0x600160
So. The first mapping is to the binary itself for the .text and .rodata sections. Cool. That makes sense.
The second mapping is also to the binary, seemingly for the .bss and FAKE_HEAP sections. Which is also more-or-less what we expected. Though it should be noted that the second mapping is larger than what is needed for .bss, but not large enough to completely fit both .bss and FAKE_HEAP. It can only contain .bss and part of FAKE_HEAP.
Then we've got the 3rd mapping, marked as [heap].
I expected 1 of 2 things to happen:
A) The kernel would fail to recognize my FAKE_HEAP section as a true heap, and would simply include the entire thing in the same segment as .bss
OR
B) The kernel would recognize my FAKE_HEAP as being an unusual/non-standard section with attributes that are consistent with a heap, and would thus mark the entire section as a heap. With the mapping start and end addresses exactly matching the memory address onto which FAKE_HEAP was loaded, and its natural end-boundary.
What actually happened:
The kernel seems to have recognized a heap that starts at an arbitrary point within my FAKE_HEAP. It does not align. My FAKE_HEAP starts at 0x600158, with BIG_BAD_BLOCK starting at 0x600160. The kernel says the heap starts at 0x601000. That is 3,752 bytes into my structure. Which does not make sense at all. There's no reason that the kernel should think the heap begins 3,752 bytes past the beginning of this structure.
So, finally, a restatement of the question(s):
Should the kernel be detecting the FAKE_HEAP section or BIG_BAD_BLOCK symbol as a heap at all?
If so, why does the start address not match up with either the section or symbol start address?
If not, why is a heap being detected at all?
How is this heap being detected?
I need to understand why this is happening. Because I cannot find a clear logical or programmatic reason for this behavior. I've been researching this problem for the past 12 hours straight and I cannot figure this out. I've been searching for issues in the assembly, the linker, and the kernel itself.
 
    