So recently I learned about the perf command in linux. I decided to run some experiments, so I created an empty c program and measured how many instructions it took to run: 
echo 'int main(){}'>emptyprogram.c && gcc -O3 emptyprogram.c -o empty
perf stat ./empty
This was the output:
 Performance counter stats for './empty':
      0.341833      task-clock (msec)         #    0.678 CPUs utilized          
             0      context-switches          #    0.000 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
           112      page-faults               #    0.328 M/sec                  
     1,187,561      cycles                    #    3.474 GHz                    
     1,550,924      instructions              #    1.31  insn per cycle         
       293,281      branches                  #  857.966 M/sec                  
         4,942      branch-misses             #    1.69% of all branches        
   0.000504121 seconds time elapsed
Why is it using so many instructions to run a program that does literally nothing? I thought that maybe this was some baseline number of instructions that are necessary to load a program into the OS, so I looked for a minimal executable written in assembly, and I found a 142 byte executable that outputs "Hi World" here (http://timelessname.com/elfbin/)
Running perf stat on the 142 byte hello executable, I get:
Hi World
 Performance counter stats for './hello':
      0.069185      task-clock (msec)         #    0.203 CPUs utilized          
             0      context-switches          #    0.000 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
             3      page-faults               #    0.043 M/sec                  
       126,942      cycles                    #    1.835 GHz                    
       116,492      instructions              #    0.92  insn per cycle         
        15,585      branches                  #  225.266 M/sec                  
         1,008      branch-misses             #    6.47% of all branches        
   0.000340627 seconds time elapsed
This still seems a lot higher than I'd expect, but we can accept it as a baseline. In that case, why did running empty take 10x more instructions? What did those instructions do? And if they're some sort of overhead, why is there so much variation in overhead between a C program and the helloworld assembly program?
 
    