i'm currently learning x86 assembly language and wondered what is the better way for implementing loops. One way would be to mov a value to ecx register and use the loop instruction and the other way would be using a jmp instruction and then comes the loop body and then a conditional jumping eventually to the beginning of the loop body. I guess the first one will has a better readability but other then that i don't know why to use it.
- 
                    Never knew/know when to accept it as there always may be a better answer i guess? Is this really important? cause i really do not know. – rob Jul 24 '11 at 17:49
- 
                    Related: [Why are loops always compiled like this?](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-like-this): it's almost always best to use a `do{}while()` structure in asm, with a conditional branch at the bottom. If the loop might need to run 0 times, then jmp to the bottom is one strategy, but usually not the best. – Peter Cordes Feb 18 '18 at 07:00
1 Answers
When you mention jmp+body+test, I believe you are talking about the translation of a while loop in high-level languages.  There is a reason for the second approach.  Let's take a look.
Consider
x = N
while (x != 0) {
    BODY
    x--
}
The naive way is
    mov ecx, N      ; store var x in ecx register
top:
    cmp ecx, 0      ; test at top of loop
    je bottom       ; loop exit when while condition false
    BODY
    dec ecx
    jmp top
bottom:
This has N conditional jumps and N unconditional jumps.
The second way is:
    mov ecx, N 
    jmp bottom
top:
    BODY
    dec ecx
bottom:
    cmp ecx, 0
    jne top
Now we still do N conditional jumps but we only do ONE unconditional jump. A small savings but it just might matter, especially because it is in a loop.
Now you did mention the loop instruction which is essentially
dec ecx
cmp ecx, 0
je somewhere
How would you work that in? Probably like this:
    mov ecx, N
    cmp ecx, 0       ; Must guard against N==0
    je bottom
top:
    BODY
    loop top         ; built-in dec, test, and jump if not zero
bottom:
This is a pretty little solution typical of CISC processors.  Is it faster than the second way above?  That depends a great deal on the architecture.  I suggest you do some research on the performance of the loop instruction in the IA-32 and Intel 64 processor architectures, if you really want to know more.
 
    
    - 86,166
- 18
- 182
- 232
- 
                    Thanks, that helped quit a bit, i'll try to do some further research about the speed of the ecx loop :) – rob Jul 24 '11 at 09:45
- 
                    3@rob, happy researching. May I suggest http://www.agner.org/optimize/optimizing_assembly.pdf ? An amazing resource. Very long. On page 89 it is mentioned that you should avoid JECXZ and LOOP because they are not so efficient on the more modern architectures. – Ray Toal Jul 24 '11 at 16:59
- 
                    1Related: [Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently) for some historical factors. Fun fact: AMD Bulldozer / Ryzen have fast `loop`, but nothing else does. Also related: [Why are loops always compiled like this?](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-like-this) for efficient loop structures: as you say, conditional branch at the bottom, and various strategies if it might need to run 0 times. – Peter Cordes Feb 18 '18 at 07:01
