Ok, I think I can finally answer my own question (thanks Wyzard for the helpful comment!)
The obvious solution, as there doesn't seem any library call doing this, is putting a hlt in inline assembly. Unfortunately, this crashed my program. Looking for the reason, it is because the default dpmi server used runs the program in ring 3 ... hlt is reserved to ring 0. So to use it, you have to modify the loader stub to load a dpmi server running your program in ring 0. See later.
Browsing through the docs, I came across __dpmi_yield(). If we are running in a multitasking environment (Win 3.x or 9x ...), there will already be a dpmi server provided by the operating system, and of course, in that case we want to give up our time slice while waiting instead of trying the privileged hlt.
So, putting it all together, the source for DOS now looks like this:
#undef __STRICT_ANSI__
#include <time.h>
#include <dpmi.h>
#include <errno.h>
static uclock_t nextTick;
static uclock_t tickTime;
static int haveYield;
void
ticker_init(void)
{
    errno = 0;
    __dpmi_yield();
    haveYield = errno ? 0 : 1;
}
void
ticker_done(void)
{
}
void
ticker_start(int msec)
{
    tickTime = msec * UCLOCKS_PER_SEC / 1000;
    nextTick = uclock() + tickTime;
}
void
ticker_stop()
{
}
void
ticker_wait(void)
{
    if (haveYield)
    {
        while (uclock() < nextTick) __dpmi_yield();
    }
    else
    {
        while (uclock() < nextTick) __asm__ volatile ("hlt");
    }
    nextTick += tickTime;
}
In order for this to work on plain DOS, the loader stub in the compiled executable must be modified like this:
<path to>/stubedit bin/csnake.exe dpmi=CWSDPR0.EXE
CWSDPR0.EXE is a dpmi server running all code in ring 0.
Still to test is whether yielding will mess with the timing when running under win 3.x / 9x. Maybe the time slices are too long, will have to check that. Update: It works great in Windows 95 with this code above.
The usage of the hlt instruction breaks compatibility with dosbox 0.74 in a weird way .. the program seems to hang forever when trying to do a blocking getch() through PDcurses. This doesn't happen however on a real MS-DOS 6.22 in virtualbox. Update: This is a bug in dosbox 0.74 that is fixed in the current SVN tree.
Given those findings, I assume this is the best way to wait "nicely" in a DOS program.
Update: It's possible to do even better by checking all available methods and picking the best one. I found a DOS idle call that should be considered as well. The strategy:
- If yield is supported, use this (we are running in a multitasking environment) 
- If idle is supported, use this. Optionally, if we're in ring-0, do a - hlteach time before calling idle, because idle is documented to return immediately when no other program is ready to run.
 
- Otherwise, in ring-0 just use plain - hltinstructions.
 
- Busy-waiting as a last resort. 
Here's a little example program (DJGPP) that tests for all possibilities:
#include <stdio.h>
#include <dpmi.h>
#include <errno.h>
static unsigned int ring;
static int
haveDosidle(void)
{
    __dpmi_regs regs;
    regs.x.ax = 0x1680;
    __dpmi_int(0x28, ®s);
    return regs.h.al ? 0 : 1;
}
int main()
{
    puts("checking idle methods:");
    fputs("yield (int 0x2f 0x1680): ", stdout);
    errno = 0;
    __dpmi_yield();
    if (errno)
    {
        puts("not supported.");
    }
    else
    {
        puts("supported.");
    }
    fputs("idle (int 0x28 0x1680): ", stdout);
    if (!haveDosidle())
    {
        puts("not supported.");
    }
    else
    {
        puts("supported.");
    }
    fputs("ring-0 HLT instruction: ", stdout);
    __asm__ ("mov %%cs, %0\n\t"
             "and $3, %0" : "=r" (ring));
    if (ring)
    {
        printf("not supported. (running in ring-%u)\n", ring);
    }
    else
    {
        puts("supported. (running in ring-0)");
    }
}
The code in my github repo reflects the changes.