I am trying to wrap my head around the workings of the ELF file format and the way objects get linked and executed.
In order to learn more I tried to analyze the output of the following assembler code:
#----------------------------------------------------------------------------------------
# Exits immediately using code 3. Runs on 64-bit Linux 
#     gcc -c basic.s && ld basic.o && ./a.out
#----------------------------------------------------------------------------------------
        .global _start
        .text
_start:
        # _exit(3)
        mov $60, %rax       # system call 60 is exit
        mov $3,  %rdi       # we want return code 3
        syscall             # invoke operating system to exit
I am using readelf and hd to analyze the output a.out.
Here are the relevant outputs:
readelf -l a.out:
Elf file type is EXEC (Executable file)
Entry point 0x401000
There are 2 program headers, starting at offset 64
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000b0 0x00000000000000b0  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x0000000000000010 0x0000000000000010  R E    0x1000
 Section to Segment mapping:
  Segment Sections...
   00     
   01     .text
Excerpts of hd a.out:
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 3e 00 01 00 00 00  00 10 40 00 00 00 00 00  |..>.......@.....|
00000020  40 00 00 00 00 00 00 00  e0 10 00 00 00 00 00 00  |@...............|
00000030  00 00 00 00 40 00 38 00  02 00 40 00 05 00 04 00  |....@.8...@.....|
00000040  01 00 00 00 04 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 40 00 00 00 00 00  00 00 40 00 00 00 00 00  |..@.......@.....|
00000060  b0 00 00 00 00 00 00 00  b0 00 00 00 00 00 00 00  |................|
00000070  00 10 00 00 00 00 00 00  01 00 00 00 05 00 00 00  |................|
00000080  00 10 00 00 00 00 00 00  00 10 40 00 00 00 00 00  |..........@.....|
00000090  00 10 40 00 00 00 00 00  10 00 00 00 00 00 00 00  |..@.............|
000000a0  10 00 00 00 00 00 00 00  00 10 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
[...]
and
[...]
00001000  48 c7 c0 3c 00 00 00 48  c7 c7 03 00 00 00 0f 05  |H..<...H........|
[...]
As you can see from this analysis, there is a program section called '.text' which is 10 bytes long and contains the instructions "MOV, MOV, SYSCALL" at file offset 0x1000 - all as expected. However, there is also another program section, this one without name, defined between offsets 0x0000 and 0x00b0, which is exactly the space occupied by the ELF header + program header 0 + program header 1. I have tried a minimal C program, and gcc creates a similar program section there as well.
The question: Why? To what end? Why is it necessary to define this program section and what does this section have to do with the execution of this program?
Bonus question: Why is the actual machine code put at offset 0x1000; Wouldn't it have been more efficient to put it at 0x00b0, right after the other one?
I am using Ubuntu 20.04 with gcc 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) on an Intel x86_64 cpu.
