Questions tagged [unicode]

Unicode is the standard for computer representation of plain text. It encompasses the Universal Character Set, intended to unambiguously represent all characters used in human writing systems in any language, Unicode Transformation Formats (UTFs), defining standardized formats for storing and transmitting Unicode text, and standards for processing and manipulating text.

Unicode is the standard for computer representation of plain . It encompasses:

  • the Universal Character Set (UCS), intended to unambiguously represent all characters used in human writing systems in any language,
  • Unicode Transformation Formats (UTFs), defining standardized formats for storing and transmitting Unicode text, and
  • standards for processing and manipulating Unicode text.

The latest version is 6.0, published in 2011.

The Universal Character Set

Unicode assigns each character an integer code point (from 0 to 0x10FFFF) in the UCS to act as a unique reference. For example:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting and collation algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

726 questions
262
votes
5 answers

Which Unicode characters do smilies like ٩(•̮̮̃•̃)۶ consist of?

How do I make Unicode texts like the one in the title, or like the following, without resorting to copy+paste? ٩(-̮̮̃-̃)۶ ٩(●̮̮̃•̃)۶ ٩(͡๏̯͡๏)۶ ٩(-̮̮̃•̃). As an aside, note that in your browser the above example should look like:
Anirudh Goel
  • 2,951
143
votes
8 answers

Impossible to put a zero after an aleph?

Me and a friend were joking about aleph's. Upon trying to type א0 (switch those 2 chars), they switched themselves! Any sequence of symbols does not stop this effect. Why is this!?? Try to type these with the 0 and א reversed (c&p for א): א0 א - …
136
votes
8 answers

How can I type special characters in Linux?

In Windows, there is the possibility to type special signs from the keyboard by holding the Alt key and typing a few numbers, that depends on which sign you want to use. Does it work with Linux in the same way?
inothemo
  • 2,289
133
votes
8 answers

Is there a unicode character for the Windows key?

I'm trying to communicate over text the Windows keyboard shortcuts. For the ones that use the Windows key, I don't want to type "Windows key +" each time. Is there a unicode character for the Windows key?
Gabriel Fair
  • 4,093
129
votes
4 answers

Why shouldn’t I use Unicode characters to simulate typographic styles (such as small caps or script)?

Unicode contains various characters that look like typographically stylised variants of characters of the basic Latin alphabet and that allow one to write texts in the corresponding typographic styles without resorting to mark-up or similar. For…
Wrzlprmft
  • 2,803
  • 5
  • 22
  • 32
127
votes
5 answers

How do these icons work: ✅️?

I can see these characters as colored icons: ✅️ It only works in Firefox for me. If you can't see the characters in color, it looks like this on my system (it's probably font-dependent): I can even see them in Firebug and tab titles: And…
Tomáš Zato
  • 4,790
95
votes
11 answers

How do you type Unicode characters using hexadecimal codes?

This is in Windows, but answers for other operating systems can be handy to others. Most guides say something to the effect of "hold down the Alt key and type in the code on the keypad". This works fine for decimal codes (like 65 for 'A'), but not…
user939
82
votes
5 answers

How do I debug an emoticon-based URL?

I came across this URL (NSFW) and need to convert this to puny code. As an experiment, I'll paste this URL here, but not sure if this will save. http://..ws/ (NSFW) How can I convert this URL to a standard DNS name so I can whois the IP space?…
76
votes
8 answers

How can I display the  (U+F8FF, Apple logo) character on Windows?

In Apple's marketing materials, the company often refers to the Apple Watch as "Watch". If that last sentence displayed as "Watch", congratulations! You're probably using an Apple device. To demonstrate, here's what the Wikipedia page for Apple…
Stevoisiak
  • 16,075
68
votes
2 answers

Setting UTF8 as default Character Encoding in Windows 7

is there a way to set Windows 7 to globally use UTF-8 as standard? its really annoying to set every single text editor to use it.
Baarn
  • 6,774
65
votes
15 answers

What's that Unicode character in my clipboard?

Is there a quick and easy way to find the Unicode code point for any character? For example, I see a funny character on a web page, or a PDF file, or some other document. What I current do is copy the character to the clipboard, save it to a file,…
55
votes
5 answers

Notepad++ inserting special Unicode characters in UTF-8

What's the best ways to enter special Unicode characters into a Notepad++ document? Do I have to rely on the operating system (Windows)? Looking for a see-and-click solution. I can bring up the ASCII Insertion Panel with Edit | Character Panel —…
54
votes
12 answers

Is there a Pac-Man-like character in ASCII or Unicode?

Simple question: is there a character that looks either like Pac-Man, or like the ghost in Pac-Man? With Google's recent Pac-Man logo, everyone should know what these look like, but in case you don't here are some sample images: If you answer "no"…
Ricket
  • 1,606
48
votes
2 answers

What is this character: '*​'?

A friend pasted a command into a Slack chat room which contained the character *. This looks like a normal * but isn't: $ uniprops '*​' uniprops: no character named ‹*​› While if I run uniprops on the asterisk I get when typing on my machine, I…
terdon
  • 54,564
45
votes
2 answers

Difference Between Unicode FRACTION SLASH and DIVISION SLASH

What is the difference between U+2044 ("Fraction Slash") and U+2215 ("Division Slash"). They seem nearly identical to me, but there's still clearly a difference, but I can't tell exactly what it is. Does anyone know?
user402879
1
2 3
48 49