12

The first 32 characters of the ASCII table (0x00 - 0x1F) are all of the non-printable characters (with the exception of 'DEL' which comes at the end of the table).

What are these used for, other than borking your terminal when you cat a binary file?

A few of them are obvious:

   Oct   Dec   Hex   Char
   ----------------------------------------------
   000   0     00    NUL '\0'

007 7 07 BEL '\a' (bell)
010 8 08 BS '\b' (backspace)
011 9 09 HT '\t' (horizontal tab)
012 10 0A LF '\n' (new line)
013 11 0B VT '\v' (vertical tab)
014 12 0C FF '\f' (form feed)
015 13 0D CR '\r' (carriage ret)

033 27 1B ESC (escape)

others, such as

   020   16    10    DLE (data link escape)
   021   17    11    DC1 (device control 1)
   022   18    12    DC2 (device control 2)
   023   19    13    DC3 (device control 3)
   024   20    14    DC4 (device control 4)

I've never seen used.

Are ACK, NAK and SYN the same bytes used for the three way handshake in TCP, or are they simply analogous?

Edit: see Eric Raymond's Things Every Hacker Once Knew

If you learned your chops after 1990 or so, the mysterious part of this is likely the control characters, code points 0-31. You probably know that C uses NUL as a string terminator. Others, notably LF = Line Feed and HT = Horizontal Tab, show up in plain text. But what about the rest?

Many of these are remnants from teletype protocols that have either been dead for a very long time or, if still live, are completely unknown in computing circles. A few had conventional meanings that were half-forgotten even before Internet times. A very few are still used in binary data protocols today.

3 Answers3

9

These are called control codes and were meant to tell the actual terminal you were on to do something, rather than pass through to display something. Some of them, such as BEL (0x07), go so far back as to when terminals were actual teletypewriters (in this, 0x07 would ring the physical bell in the teletype).

DLE is meant to work like ESC - once the terminal receives it, further incoming characters are meant to be a command or other communication to the terminal, and not to be output to the actual device itself. Though I've never witnessed an actual use of it.

ACK, NAK, and SYN (and many others like SOH start of header, STX start of text, ETX end of text) could be used to implement a protocol, but weren't designed with TCP/IP in mind. TCP/IP indicates that by setting bits in a header, not by transmitting an entire ASCII code. These may be useful if doing something like transmitting files over a 56k modem. I know serial/modem protocols like ZModem use a couple of these, and I'm sure there's other serial/56k-modem based stuff that does.

This Wikipedia page and this page available via the Wayback Machine know more than I do about them, including what the DC1, DC2, DC3, and DC4 codes were meant for.

LawrenceC
  • 75,182
5

Back in the old days they had mainframe computers to do bulk processing and simple "terminal" machines that displayed the data.

These terminals were connected via serial data lines, this means that some kind of command set is needed so that both sides can signal when they are ready to receive data, ready to send and so on. The ASCII character set basically lists out all the commands and characters available and standardises them.

The signals you have never seen used are likely only seen on very specific terminals and while ACK, SYN and NAK are analogous to their TCP counterparts they are not directly related.

Mokubai
  • 95,412
3

The purpose of the first set of ASCII character codes or the set of control code characters is to provide a way for a computer to control a peripheral connected to the computer with a cable such as RS-232 serial interface or RS-485 serial interface or IEEE 1284 parallel interface.

These control codes had several purposes:

  • control the behavior of the device (carriage return to return print head to beginning of line or bell to ring the bell on a teletype to catch the attention of an operator)
  • provide standard message format characters used in messages to a device (STX to indicate start of text or ETX to indicate end of text)
  • to trigger out of band communication (use ESC character within text to indicate the beginning of a command to the device or the DC1 character for an XON indicator telling the remote device to resume sending text or the DC3 character for an XOFF indicator telling the remote device to stop sending text)

RFC 20 ASCII format for Network Interchange provides a description of the proposed control codes and their function. For instance here is a partial list from section 5.2 control codes of RFC20:

      NUL (Null): The all-zeros character which may serve to accomplish
   time fill and media fill.
      SOH (Start of Heading): A communication control character used at
   the beginning of a sequence of characters which constitute a
   machine-sensible address or routing information.  Such a sequence is
   referred to as the "heading."  An STX character has the effect of
   terminating a heading.
      STX (Start of Text): A communication control character which
   precedes a sequence of characters that is to be treated as an entity
   and entirely transmitted through to the ultimate destination.  Such a
   sequence is referred to as "text."  STX may be used to terminate a
   sequence of characters started by SOH.
      ETX (End of Text): A communication control character used to
   terminate a sequence of characters started with STX and transmitted
   as an entity.
      EOT (End of Transmission): A communication control character used
   to indicate the conclusion of a transmission, which may have
   contained one or more texts and any associated headings.
      ENQ (Enquiry): A communication control character used in data
   communication systems as a request for a response from a remote
   station.  It may be used as a "Who Are You" (WRU) to obtain
   identification, or may be used to obtain station status, or both.
      ACK (Acknowledge): A communication control character transmitted
   by a receiver as an affirmative response to a sender.
      BEL (Bell): A character for use when there is a need to call for
   human attention.  It may control alarm or attention devices.

The original teletypes or teleprinters used a print head with paper which was replaced by cathode ray tube terminals which emulated the behavior of the original teletypes on the surface of the CRT screen.

The most famous example of the CRT was the DEC VT-100 terminal whose use of ANSII control codes became a defacto standard for CRTs. The VT-100 control codes are also commonly emulated in terminal windows on various Unix, Linux, MS-DOS, and other operating systems (see also the curses user interface programming library).

The VT100 was introduced in August 1978, replacing the VT50/VT52 family. Like the earlier models, it communicated with its host system over serial lines at a minimum speed of 50 bit/s, but increased the maximum speed to 19,200 bit/s, double that of the VT52.2

The major change within the system was the control system. Unlike the VT50/52's proprietary cursor control language, the VT100 was based on the emerging ANSI X3.64 standard for command codes.[a] At the time, computer vendors suggested that the standard was beyond the state of the art and could not be implemented at a reasonable price. The introduction of low-cost microprocessors and the ever-falling cost of computer memory addressed these problems, and the VT100 used the new Intel 8080 as its internal processor.4 In addition, the VT100 provided backwards compatibility for VT52 users, with support for the VT52 control sequences.5

ASCII control code applications

The ASCII control codes were designed to provide a standard set of codes for common serial message protocol needs. While the structure or format of a message was proprietary as was the data sent in a message, the ASCII control codes provided standard codes that everyone could agree to use in the control part of their messages.

Modern protocols use text such as JSON or XML because these messages are transported over high speed networks for the most part. However early computer equipment typically used RS-232 or RS-485 serial communication equipment and the messages between the devices were as compact and concise as possible. So rather than sending a text message such as "ACK", three characters, a device would send a single character, the ACK control code when acknowledging a message.

An example of ASCII control codes is when a PC is sending text with formatting directives to a thermal receipt printer. The receipt printer is connected to the PC with a serial cable, often a DB-9 cable or a USB cable. The PC sends a series of text characters to the receipt printer however some of the characters are printed normally which others are printed double high and double wide.

This change in the size of the printed text is communicated to the receipt printer by embedding print format commands in the stream of text characters by using the ESC or escape control code. When the printer sees the ESC control code then it knows that there will be a printer command following so it looks for the printer command character sequence, makes the change to its printing behavior, then continues process the text that is sent until it sees another ESC control code.

Also while processing the text string, when the receipt printer sees an ASCII control code that indicates a change in the position of the print head such as a carriage return or tab character or backspace character or line feed character, the receipt printer changes the position of the print head and then resumes printing text.

The Wikipedia topic IPTC 7901 describes the use of these control characters with news service messages beginning with the formal approval of the IPTC protocol in 1979 which sounds to be similar to an RSS feed protocol. The actual specification is available from the IPTC web site as The IPTC Recommended Message Format, 1995.

Data flow control

With slow speed serial communication between two machines in the case of a large amount of data such as a file transfer, the sending device could easily send more data, faster than the receiving device could process it. This is why the RS-232 serial communication protocol specification has several hardware flow control pins in the connector for data set state communication. However there were many cases where cables with the additional wires were not available so the use of software flow control using ASCII control codes, DC1 for XON and DC3 for XOFF, was introduced.

Serial communication modems also used ASCII control codes for software flow control as well as for application such as faxes (see Wikipedia topic Command and Data modes (modem)).