Empty __asm__ statements are not enough: better use data dependencies
Like this:
main.c
int main(void) {
    unsigned i;
    for (i = 0; i < 10; i++) {
        __asm__ volatile("" : "+g" (i) : :);
    }
}
Compile and disassemble:
gcc -O3 -ggdb3 -o main.out main.c
gdb -batch -ex 'disas main' main.out
Output:
   0x0000000000001040 <+0>:     xor    %eax,%eax
   0x0000000000001042 <+2>:     nopw   0x0(%rax,%rax,1)
   0x0000000000001048 <+8>:     add    $0x1,%eax
   0x000000000000104b <+11>:    cmp    $0x9,%eax
   0x000000000000104e <+14>:    jbe    0x1048 <main+8>
   0x0000000000001050 <+16>:    xor    %eax,%eax
   0x0000000000001052 <+18>:    retq 
I believe that this is robust, because it places an explicit data dependency on the loop variable i as suggested at: Enforcing statement order in C++ and produces the desired loop:
This marks i as an input and output of inline assembly. Then, inline assembly is a black box for GCC, which cannot know how it modifies i, so I think that really can't be optimized away.
If I do the same with an empty __asm__ as in:
bad.c
int main(void) {
    unsigned i;
    for (i = 0; i < 10; i++) {
        __asm__ volatile("");
    }
}
it appears to completely remove the loop and outputs:
   0x0000000000001040 <+0>:     xor    %eax,%eax
   0x0000000000001042 <+2>:     retq
Also note that __asm__("") and __asm__ volatile("") should be the same since there are no output operands: The difference between asm, asm volatile and clobbering memory
What is happening becomes clearer if we replace it with:
__asm__ volatile("nop");
which produces:
   0x0000000000001040 <+0>:     nop
   0x0000000000001041 <+1>:     nop
   0x0000000000001042 <+2>:     nop
   0x0000000000001043 <+3>:     nop
   0x0000000000001044 <+4>:     nop
   0x0000000000001045 <+5>:     nop
   0x0000000000001046 <+6>:     nop
   0x0000000000001047 <+7>:     nop
   0x0000000000001048 <+8>:     nop
   0x0000000000001049 <+9>:     nop
   0x000000000000104a <+10>:    xor    %eax,%eax
   0x000000000000104c <+12>:    retq
So we see that GCC just loop unrolled the nop loop in this case because the loop was small enough.
So, if you rely on an empty __asm__, you would be relying on hard to predict GCC binary size/speed tradeoffs, which if applied optimally, should would always remove the loop for an empty __asm__ volatile(""); which has code size zero.
noinline busy loop function
If the loop size is not known at compile time, full unrolling is not possible, but GCC could still decide to unroll in chunks, which would make your delays inconsistent.
Putting that together with Denilson's answer, a busy loop function could be written as:
void __attribute__ ((noinline)) busy_loop(unsigned max) {
    for (unsigned i = 0; i < max; i++) {
        __asm__ volatile("" : "+g" (i) : :);
    }
}
int main(void) {
    busy_loop(10);
}
which disassembles at:
Dump of assembler code for function busy_loop:
   0x0000000000001140 <+0>:     test   %edi,%edi
   0x0000000000001142 <+2>:     je     0x1157 <busy_loop+23>
   0x0000000000001144 <+4>:     xor    %eax,%eax
   0x0000000000001146 <+6>:     nopw   %cs:0x0(%rax,%rax,1)
   0x0000000000001150 <+16>:    add    $0x1,%eax
   0x0000000000001153 <+19>:    cmp    %eax,%edi
   0x0000000000001155 <+21>:    ja     0x1150 <busy_loop+16>
   0x0000000000001157 <+23>:    retq   
End of assembler dump.
Dump of assembler code for function main:
   0x0000000000001040 <+0>:     mov    $0xa,%edi
   0x0000000000001045 <+5>:     callq  0x1140 <busy_loop>
   0x000000000000104a <+10>:    xor    %eax,%eax
   0x000000000000104c <+12>:    retq   
End of assembler dump.
Here the volatile is was needed to mark the assembly as potentially having side effects, since in this case we have an output variables.
A double loop version could be:
void __attribute__ ((noinline)) busy_loop(unsigned max, unsigned max2) {
    for (unsigned i = 0; i < max2; i++) {
        for (unsigned j = 0; j < max; j++) {
            __asm__ volatile ("" : "+g" (i), "+g" (j) : :);
        }
    }
}
int main(void) {
    busy_loop(10, 10);
}
GitHub upstream.
Related threads:
Tested in Ubuntu 19.04, GCC 8.3.0.