Background
I've assumed for a while that gcc will convert while-loops into do-while form. (See Why are loops always compiled into "do...while" style (tail jump)?)
And that -O0 for while-loops...
while (test-expr)
    body-statement
..Will generate code on the form of jump-to-middle do-while
    goto test;
loop:
    body-statement
test:
    if (test-expr) goto loop;
And gcc -O2 will generate guarded do while
if (test-expr)
    goto done;
loop:
    body-statement
    if (test-expr) goto loop;
done:
Concrete examples
Here are godbolt examples of functions for which gcc generates the kind of control flow I'm describing above (I use for-loops but a while loop will give the same code).
This simple function...
int sum1(int a[], size_t N) {
    int s = 0;
    for (size_t i = 0; i < N; i++) {
        s += a[i];
    }
    return s;
}
Will for -O0 generate this jump to middle code
```sum1:
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-24], rdi
        mov     QWORD PTR [rbp-32], rsi
        mov     DWORD PTR [rbp-4], 0
        mov     QWORD PTR [rbp-16], 0
        jmp     .L2
.L3:
        mov     rax, QWORD PTR [rbp-16]
        lea     rdx, [0+rax*4]
        mov     rax, QWORD PTR [rbp-24]
        add     rax, rdx
        mov     eax, DWORD PTR [rax]
        add     DWORD PTR [rbp-4], eax
        add     QWORD PTR [rbp-16], 1
.L2:
        mov     rax, QWORD PTR [rbp-16]
        cmp     rax, QWORD PTR [rbp-32]
        jb      .L3
        mov     eax, DWORD PTR [rbp-4]
        pop     rbp
        ret
Will for -O2 generate this guarded-do code.
sum1:
        test    rsi, rsi
        je      .L4
        lea     rdx, [rdi+rsi*4]
        xor     eax, eax
.L3:
        add     eax, DWORD PTR [rdi]
        add     rdi, 4
        cmp     rdi, rdx
        jne     .L3
        ret
.L4:
        xor     eax, eax
        ret
My question
What I'm after is hand-wavy rule to apply when looking at -Os loops. I'm more used to looking at -O2 code and now that I'm working in the embedded field where -Os is more prevalent, I'm surprised by the form of loops I see.
It seems that gcc -Og and -Os both generate code as a jmpat a bottom and if() break at the top. Clang on the other hand generated guarded-do-while A godbolt link to gcc and clang output
Here is an example of gcc -Os output for the above function:
sum1:
        xor     eax, eax
        xor     r8d, r8d
.L2:
        cmp     rax, rsi
        je      .L5
        add     r8d, DWORD PTR [rdi+rax*4]
        inc     rax
        jmp     .L2
.L5:
        mov     eax, r8d
        ret
- Am I correct in assuming that gcc -Ogand-Osgenerates code on the form I described above?
- Does anyone have a resource that describes the rationale for using while-form for -Ogand-Os? Is it by design or an accidental fall-out form the way the optimization passes are organized.
- I thought that converting loops into do-while form was part of the early canonicalization done by compilers? How come gcc -O0generates do-while but gcc-Oggives while-loops? Do that canonicalization only happen when optimization is enabled?
Sidenote: I'm surprised by how much code generated with -Os and -O2 differ given that there aren't many compiler flags that are different. Maybe many passes checks some variable for tradeoff_speed_vs_space.
