4

The ECMA-48 ("ANSI escape sequences") standard describe two ways of encoding the C1 set of control codes: using 2 character ESC sequences, or alternatively, using 8-bit control characters.

Wikipedia articles explain that the two character ESC sequences are more appropriate for use with UTF-8.

Quoting from ANSI escape code:

The standard says that in 8-bit environments these two-byte sequences can be merged into single C1 control code in the 0x80–0x9F range. However on modern devices those codes are often used for other purposes, such as parts of UTF-8 or for CP-1252 characters, so only the 2-byte sequence is used.

and from C0 and C1 control codes:

The C1 characters in Unicode require 2 bytes to be encoded in UTF-8 (for instance CSI at U+009B is encoded as the bytes 0xC2, 0x9B in UTF-8). Thus the corresponding control functions are more commonly accessed using the equivalent two byte escape sequence intended for use with systems that have only 7-bit bytes.


Are there any command-line tools can be used to directly convert 8-bit C1 control characters (as specified by ECMA-48) into two character ESC sequences?

My best attempt so far has been to try and use iconv:

$ printf $(echo -en "\x9b") | iconv --from-code=ANSI_X3.4 --to-code=UTF-8 | od -t x1
iconv: illegal input sequence at position 0

For debugging purposes I'm using od -t x1 to render the result back into hexadecimal. The result I'm hoping to get would be the same as the result of running:

$ printf $(echo -en "\x27[") | od -t x1
0000000 27 5b
0000002

In other words, does there exist a command-line tool where you can pipe in a C1 control character like \x9b and get back an escape sequence like \x27[?

EDIT: Or as egmont rightly suggests, more appropriately, an interactive tool rather than something you pipe into.

0 Answers0