98

Why is ^M used to represent a carriage return in VIM and other contexts?

My guess is that M is the 13th letter of the Latin alphabet and a carriage return is \x0D or decimal 13. Is this the reason? Is this representation documented anywhere?

I notice that Tab is represented by ^I, which is the ninth letter of the Latin alphabet. Conversely, Tab is \x09 or decimal 9, which supports my theory stated above. However, where might this be documented as fact?

Oliver Salzburg
  • 89,072
  • 65
  • 269
  • 311
dotancohen
  • 11,720

7 Answers7

120

I believe that what OP was actually asking about is called Caret Notation.

Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code

The full list of ASCII control characters along with caret notation can be found here

Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.

gws
  • 359
Art Gertner
  • 7,429
23

That is exactly the reason.

ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7) manual page from a random Linux system (man ascii), up to and including CR (13):

   Oct   Dec   Hex   Char                       
   ─────────────────────────────────────────────
   000   0     00    NUL '\0'                    
   001   1     01    SOH (start of heading)     
   002   2     02    STX (start of text)         
   003   3     03    ETX (end of text)           
   004   4     04    EOT (end of transmission)   
   005   5     05    ENQ (enquiry)               
   006   6     06    ACK (acknowledge)           
   007   7     07    BEL '\a' (bell)             
   010   8     08    BS  '\b' (backspace)       
   011   9     09    HT  '\t' (horizontal tab)  
   012   10    0A    LF  '\n' (new line)        
   013   11    0B    VT  '\v' (vertical tab)    
   014   12    0C    FF  '\f' (form feed)       
   015   13    0D    CR  '\r' (carriage ret)    

Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.

The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.

Flup
  • 3,675
  • 24
  • 27
14

The notation goes back to the earliest ASCII Teletypes (ca 1963). There was a CTRL key that toggled the 0x40 bit so that CTRL-M (carriage return) would be 0D instead of 4D, CTRL-G (bell) would be 07 instead of 47, CTRL-L (form feed) would be 0C instead of 4C.

There was no "design" in assigning particular letters to particular functions, it was just chance that, when the dust settled from assigning ASCII codes, the M key was one bit different from carriage return and hence carriage return became CTRL-M.

Here is the best shot I can find of an ASR33 keyboard. As you can see the control character names are printed in small letters on the corresponding alpha keys.

Teletype Model 33 ASR with paper tape punch/reader

Image by Marcin Wichary, User:AlanM1 (Derived (cropped) from File:ASR-33 2.jpg) [CC BY 2.0], via Wikimedia Commons

The M key does not have a notation on it because there is a dedicated "RETURN" key, so CTRL-M is redundant.

Palec
  • 486
  • 5
  • 20
3

The caret (^) is just shorthand for writing hold the Control key - CTRL down.

In the good old days you could type these codes (see above) in directly, Ctrl key + G (^G) would make the terminal go "ding"

When you want to add a CR in Vim you use Ctrl key + M etc tab = Ctrl + I

Don
  • 31
2

The need for some visual manner of displaying what are by definition non-printable characters.

So, someone in the early 1970s (or maybe earlier) (I remember seeing it on CP/M, and someone else has already mentioned TOPS) decided that "caret plus letter" would be the symbol for the 26 unprintable ASCII control characters with values 1 thru 26. Value 0 is/was printed as ^@, and value 127 as ^?.

RonJohn
  • 403
1

Where is it documented, well this page lists every control character, with how to enter/represent it with the control key(though the first one, ascii character 0, has no control key representation), and it has nothing for character 127. And it provides sources at the bottom

https://www.cs.tut.fi/~jkorpela/chars/c0.html

One might wonder, given that there are 33 control characters (ASCII characters 0-31 so 32 charactres, + character 127. so, =33 characters) How they would be all represented as there are only 26 letters in the alphabet. Well, it uses Ctrl-A for Ascii character 1, Ctrl-Z for ascii character 26, and there once it reaches Ctrl-Z, it uses [ \ ] ^ _

It lists Ctrl-Z as SUB, though in DOS and the cmd prompt it's EOF, and as a techie user you use it when doing copy con a.a where a.a is your file. You enter the text and terminate it with Ctrl-Z which funnily enough doesn't enter an EOF marker. But does tell CMD that's the end of the file so CMD writes it.

That cs.tut.fi webpage gives this as a source
http://www.wps.com/texts/codes/X3.4-1963/index.html

but it's a broken link, but available on archive.org it's in the form of JPGs

American Standard Code for Information Interchange
ASA standard X3.4-1963

https://web.archive.org/web/20010430085116/http://www.wps.com/texts/codes/X3.4-1963/index.html

barlop
  • 25,198
0

You can see all of the non pritable ASCII characters Control mapping in this table.

Ofir Luzon
  • 246
  • 2
  • 8