I'm trying to understand how to measure performance and decided to write the very simple program:
section .text
    global _start
_start:
    mov rax, 60
    syscall
And I ran the program with perf stat ./bin  The thing I was surprised by is the stalled-cycles-frontend was too high.
      0.038132      task-clock (msec)         #    0.148 CPUs utilized          
             0      context-switches          #    0.000 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
             2      page-faults               #    0.052 M/sec                  
       107,386      cycles                    #    2.816 GHz                    
        81,229      stalled-cycles-frontend   #   75.64% frontend cycles idle   
        47,654      instructions              #    0.44  insn per cycle         
                                              #    1.70  stalled cycles per insn
         8,601      branches                  #  225.559 M/sec                  
           929      branch-misses             #   10.80% of all branches        
   0.000256994 seconds time elapsed
As I understand the stalled-cycles-frontend it means that CPU frontend has to wait for the result of some operation (e.g. bus-transaction) to complete. 
So what caused CPU frontend to wait for most of the time in that simplest case?
And 2 page faults? Why? I read no memory pages.
 
    