I tried to find out the speed difference between plain loops, loop loops and builtin rep loops. I wrote three programs to compare the behavior:
Program 1
_start: xor %ecx,%ecx
0:      not %ecx
        dec %ecx
        jnz 0b
        mov $1,%eax
        xor %ebx,%ebx
        int $0x80       # syscall 1: exit
Program 2
_start: xor %ecx,%ecx
        not %ecx
        loop .
        mov $1,%eax
        xor %ebx,%ebx
        int $0x80
Program 3
_start: xor %ecx,%ecx
        not %ecx
        rep nop # Do nothing but decrement ecx
        mov $1,%eax
        xor %ebx,%ebx
        int $0x80
It turned out the third program doesn't work as expected, and some recherche tells me, that rep nop aka pause does something completely unrelated.
What are the rep, repz and repnz prefixes doing, when the instruction following them is not a string instruction? 
 
     
    