6

This is somewhat related to question

On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

Even on Windows 7, I found that the only way I can get unicode to go into a file is by

> cmd /U
> dir /B > files.txt

the file will be in "Unicode" when I open in Notepad and try "Save As", and if I dir /B > files.html and open the HTML file in firefox, it can show using Encoding of UTF-16 (or UTF-16 LE).

but, if I want to see it on the screen instead of having it go to a file, it is still impossible. Is there a way to make it happen? Possibly somehow telling cmd not to show nonprintable characters as "?"

Update: I tried cmd.exe, cygwin's bash on windows, and PowerShell. They are the same. Except if I change the "Properties -> Font" to Consolas or Lucida Console, there is some improvement -- now it is not question mark but is either square border or square with a question mark in it.

The more expensive Mac computers with Mac OS X can do it. The free Ubuntu can do it too.

nonopolarity
  • 9,886

5 Answers5

6

This is a very old question, but all of the answers given here are wrong.

You will never see Unicode output on the Windows command line (CMD.exe). The reason is that CMD cannot display Unicode. It can, however, display DBCS (Double-Byte Character Set).

If you want to see Japanese output, for example, you have to change your System Locale to Japanese and reboot. Then, you'll be able to see Japanese DBCS (i.e. Shift-JIS) characters on the command line. Windows supports Japanese Shift-JIS, Simplified Chinese, Korean, and Traditional Chinese "Big5" DBCS code pages.

Incidentally, you can pipe UTF-16 (inaccurately used interchangeably with "Unicode" by Microsoft) to a file, then open that file in, say, Notepad, and view the Unicode characters. You can also mark and copy the gibberish text from CMD.exe and paste it into Notepad and see the Unicode characters. In other words, CMD supports Unicode, but it doesn't display Unicode.

You can find more information in this blog post.

Jeff
  • 175
  • 1
  • 4
1

Based on your username I suspect you mainly work with asian languages.

Windows tools operate normally in unicode mode (as you saw by piping the output of dir into a file and opening that file with an editor):

  1. the tool does its stuff
  2. it outputs unicode characters
  3. another program takes this output and has to display it.

to display any character on the screen the program from step 3 has to lookup the glyph appropriate for the given byte sequence. example:

  • 0x65 'a' maps to a different glyph in each font (so the 'a' looks different from font to font)

  • 0x937 'Ω' (greek 'omega') maps to a different glyph in each font as well

this mapping only works IF the font has a glyph for the given byte sequence. otherwise the visual result differs, sometimes you see '?', sometimes diamonds etc.

again: dirproduces bytesequences, which sometimes are purely in the ASCII-range, sometimes they are in the unicode range (depending on what filenames it finds). it sends these sequences to another program which is responsible for actually rendering the bytesequences. to be able to display these sequences, this program has to map the sequence to a glyph. to do that, it has to search in a font for the glyph. if the font does not have a glyph for the given sequence, then the program can not display the byte sequence produced by, for example, dir.

so, the solution to your problem (seeing any unicode-character in the 'console / terminal' of windows) is: use a font for the program which has (almost) every glyph for (almost) any given unicode bytesequence in it.

akira
  • 63,447
0

https://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how

Use chcp 65001 to change the codepage to UTF8 and use Lucida Console.

ta.speot.is
  • 14,323
0

It has nothing to do with encodings since the Windows console always uses Unicode internally. The characters are simply not available in the fonts you use, which are designed for programming and European languages. I don't have access to Windows at the moment, but I remeber that I could print Greek characters after switching to the Lucida Console font. Using a font like DejaVu Sans Mono might work.

Philipp
  • 271
0

Ok, this is a solution using PowerShell:

1) Click the Start button on Windows 7
2) Now, in the blank line, type in PowerShell
3) Choose PowerShell ISE <-- note it is ISE

Now, if you do ls, you will be able to see unicode characters...

4) if you also use chcp 65001, then if your program prints out UTF-8 characters, they will be nicely displayed as well.

You can also ls > list.txt and then type list.txt and the content shows up in Unicode characters as well.

tree will still not show unicode characters.

also, inside the PowerShell ISE, cmd /U /C dir /B will not work either.

ls -R will.

nonopolarity
  • 9,886