8

I was under the impression that you could only start variable names with letters and _, however while testing around, I also found out that you can start variable names with $, like so:

Code

#include <stdio.h>

int main() {
    int myvar=13;
    int $var=42;
    printf("%d\n", myvar);
    printf("%d\n", $var);
}

Output

13
42

According to this resource, it says that you can't start variable names with $ in C, which is wrong (at least when compiled using my gcc version, Apple LLVM version 10.0.1 (clang-1001.0.46.4)). Other resources that I found online also seem to suggest that variables can't start with $, which is why I'm confused.

Do these articles just fail to mention this nuance, and if so, why is this a feature of C?

Jay Mody
  • 3,727
  • 1
  • 11
  • 27

5 Answers5

7

In the C 2018 standard, clause 6.4.2, paragraph 1 allows implementations to allow additional characters in identifiers.

It defines an identifier to be an identifier-nondigit character followed by any number of identifier-nondigit or digit characters. It defines digit to be “0“ to “9”, and it defines the identifier-nondigit characters to be:

  • a nondigit, which is one of underscore, “a” to “z”, or “A” to “Z”,
  • a universal-character-name, or
  • other implementation-defined characters.

Thus, implementations may define other characters that are allowed in identifiers.

The characters included as universal-character-name are those listed in ranges in Annex D of the C standard.

The resource you link to is wrong in several places:

Variable names in C are made up of letters (upper and lower case) and digits.

This is false; identifiers may include underscores and the above universal characters in every conforming implementation and other characters in implementations that permit them.

$ not allowed -- only letters, and _

This is incorrect. The C standard does not require an implementation to allow “$”, but it does not disallow an implementation from allowing it. “$” is allowed by some implementations and not others. It can be said not to be a part of strictly conforming C programs, but it may be a part of conforming C programs.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Hey; on accessing the C18 draft I think I found a difference between the standard and gcc. A leading $ might be intended to work if expressed as a unicode codepoint. See my recent edit. – Joshua Sep 16 '19 at 01:47
3

This answers your question:

In GNU C, you may normally use dollar signs in identifier names. This is because many traditional C implementations allow such identifiers. However, dollar signs in identifiers are not supported on a few target machines, typically because the target assembler does not allow them.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
Déjà vu
  • 28,223
  • 6
  • 72
  • 100
2

This is allowed in GCC and LLVM because many traditional C implementations allow identifiers like this.

One such reason is that VMS commonly uses these, where a lot of system library routines have names like SYS$SOMETHING.

Here's a link to the GCC docs describing this:

https://gcc.gnu.org/onlinedocs/gcc/Dollar-Signs.html

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
1

TL;DR: it's the assembler not the compiler

Ok, so I did some research into this. It's not really allowed, but what excludes it as the assembly pass. Trying to do the following fails:

#include <stdio.h>

extern int $func();

int main() {
    int myvar=13;
    int $var=42;
    printf("%d\n", myvar);
    printf("%d\n", $var);
    $func();
}
joshua@nova:/tmp$ gcc -c test.c
/tmp/ccg7zLVB.s: Assembler messages:
/tmp/ccg7zLVB.s:31: Error: operand type mismatch for `call'
joshua@nova:/tmp$

I pulled K&R C version 2 (this covers ANSI C) off my shelf and it says "Identifiers are a sequence of letters and digits. The first character must be a letter; the underscore _ character counts as a letter. Upper and lower case letters are different. Identifiers may have any length ... [obsolete verbiage omitted]."

This reference as clearly aged; and almost everybody accepts high-unicode as letters. What's going on is the back-end assembler sees symbols bytewise and every byte with the high bit set counts as a letter. If you're crazy enough to use shift-jis outside of string literals, chaos can ensue; but otherwise this tends to work well enough.

I accessed a draft of C18 which says identifier-nondigit: nondigit ; nondigit ; universal-character-name other-implementation-defined-characters. Therefore, implementations are allowed to permit additional characters.

For universal-character-name, we have a restriction: "A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ( $ ), 0040 ( @ ), or 0060 (‘), nor one in the range D800 through DFFF inclusive."

The following code still chokes at the assembly pass as expected:

#include <stdio.h>

extern int \U00000024func();

int main()
{
    return \U00000024func();
}

The following code builds:

#include <stdio.h>

extern int func\U00000024();

int main()
{
    return func\U00000024();
}
S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
Joshua
  • 40,822
  • 8
  • 72
  • 132
  • 1
    You can view the drafts for free: https://www.pdf-archive.com/2014/10/02/ansi-iso-9899-1990-1/ansi-iso-9899-1990-1.pdf – S.S. Anne Sep 16 '19 at 01:27
  • 2
    That exact same wording is in C11 (n1570.pdf). But you don't seem to have looked at Appendix D (which is normative); U+0024 is *not* in the list of code values that are valid in universal character names in identifiers. So the C standard doesn't let you use \u0024 in an identifier, but it doesn't stop an implementation from allowing `$` as part of "other implementation-defined characters" (which possibly extends to the implementation-defined use of `\u0024` as well). – rici Sep 16 '19 at 03:00
1

Depends on the dialect of C and the options selected. Historically some Cs supported $ to be compatible with existing libraries when C was new. You may need to use a command line option to enable $ or another to turn if of if strictly conforming C is valuable to you.

A spot of history: in my early years I got into enough mainframe rooms to know that $ is one of what IBM mainframes called "national characters" of $,#, and @ that could show up in identifiers of programming languages like PL/1 and mainframe assembler. This worked down to some mainframe spin-offs, such as the IBM 1130. It looked to me like early impact printers using pieces of shaped slugs to print with, and CRT terminals, could swap out these characters to meet the national needs of foreign customers. The IBM 1403 printer had many "print chains" to choose from for different human languages and technical purposes.

Some non-IBM identifiers picked up on at least some of these characters. GNU C, VMS, and JavaScript kept "$". "$" is the only character of old that seems to have survived to this day, even as an option, in most languages. The odd thing is back on early IBM days the underscore was invalid for identifier names.

Gilbert
  • 3,740
  • 17
  • 19