TL;DR: it's the assembler not the compiler
Ok, so I did some research into this. It's not really allowed, but what excludes it as the assembly pass. Trying to do the following fails:
#include <stdio.h>
extern int $func();
int main() {
int myvar=13;
int $var=42;
printf("%d\n", myvar);
printf("%d\n", $var);
$func();
}
joshua@nova:/tmp$ gcc -c test.c
/tmp/ccg7zLVB.s: Assembler messages:
/tmp/ccg7zLVB.s:31: Error: operand type mismatch for `call'
joshua@nova:/tmp$
I pulled K&R C version 2 (this covers ANSI C) off my shelf and it says "Identifiers are a sequence of letters and digits. The first character must be a letter; the underscore _ character counts as a letter. Upper and lower case letters are different. Identifiers may have any length ... [obsolete verbiage omitted]."
This reference as clearly aged; and almost everybody accepts high-unicode as letters. What's going on is the back-end assembler sees symbols bytewise and every byte with the high bit set counts as a letter. If you're crazy enough to use shift-jis outside of string literals, chaos can ensue; but otherwise this tends to work well enough.
I accessed a draft of C18 which says identifier-nondigit: nondigit ; nondigit ; universal-character-name other-implementation-defined-characters. Therefore, implementations are allowed to permit additional characters.
For universal-character-name, we have a restriction: "A universal character name shall not specify a character whose short identifier is less than 00A0
other than 0024 ( $ ), 0040 ( @ ), or 0060 (‘), nor one in the range D800 through DFFF inclusive."
The following code still chokes at the assembly pass as expected:
#include <stdio.h>
extern int \U00000024func();
int main()
{
return \U00000024func();
}
The following code builds:
#include <stdio.h>
extern int func\U00000024();
int main()
{
return func\U00000024();
}