Convert UTF-16 LE to UTF-8 in windows via command line

Question

(question re-written to be more useful)

I have a batch script which will interact with command line programs, take their output, and then perform decisions based on that output.

One of the programs I need to interact with is a fairly old one, so I am stuck with it's quirks. When I pipe it's output to a text file, that text file is in the UTF-16 LE encoding.

Here's how I do that:

program -parameter > resultat.txt

Under Windows 7, this encoding seems to be troublesome for cmd/batch work, because you cannot read the contents of such a text file into a variable.

Here is an example, (this only uses the first line of the text file):

set /p Var=<resultat.txt
echo %Var%
cmd /k

It just echoes nothing, saying "ECHO is on".

Also, if you use "type" to print the contents of the text file, there is weird spacing, suggesting it's not properly being processed.

Attempted solution [1] - Powershell

After research, I found that powershell can convert txt encodings, using the following method:

Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding>

Using Notepad++, I did some research, what encoding do I need to attain?

UTF-8 (no BOM), which is equivalent to "ANSI" in Notepad, is the encoding I need, loading text files to variables, and the "type" command, both work flawlessly when this encoding is used. How do I know? If I open the piped text file in Notepad, and resave as "ANSI" encoding, everything works flawlessly.

-Encoding ascii

...Is the option which should have worked, as this produces a result in UTF-8 (no BOM), but it seems to be unable to handle UTF-16 LE source encoding format, and does not produce useable output. When I opened the resultant file in Notepad++ it identified it as UTF-16 LE "Unix", which was odd.

Funny enough: if I resave piped txt file as "unicode" in Notepad, this produces a UTF-16 LE BOM file, which works with the above conversion parameter to produce a perfect UTF-8 file. At this point, I extended my research to also ask the question "How can I add BOM to UTF-16 LE encoding?" As I could combine such knowledge with the powershell knowledge. However, spoiler alert: I was unsuccessful in finding a decent answer.

-Encoding utf8

...Is another similar option, but it produces a UTF-8 BOM file (the equivalent of saving as "UTF-8" in Notepad), this produces an output with corruption.

So to sum up:

I am looking for a command line tool/method (open or proprietary, 1st or 3rd party), to be able to achieve a convesion as follows:

UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)
UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)

score 4 · Accepted Answer · answered Jun 06 '23 at 22:54

Path of least resistance: use libiconv for Windows

After about a day of searching (back when the question was asked), I noticed that Stackoverflow had a tag called [utf16-le] and I decided it would be worth my time to go through all of the threads using this tag.

I found a solution which shows off a program called "iconv", and even the full command needed to carry out the conversion. Unlike the powershell method, you need to accurately specify input encoding as well as the output encoding, but also unlike the powershell method, it produces a good result.

Here is the helpful thread:

https://stackoverflow.com/questions/17287713/using-iconv-to-convert-from-utf-16le-to-utf-8

iconv is not a Windows utility, but it was apparently ported to Windows, and whilst the question linked above was asked with the [Linux] tag, one of the answers contained an example which is somehow entirely compatible with Windows:

iconv -f UTF-16LE -t UTF-8 infile > outfile

I downloaded the files from here:

https://sourceforge.net/projects/gnuwin32/files/libiconv/1.9.2-1/

I only needed the "bin" (binary) and "dep" (dependencies), extract the contents of both into the same folder, and you are good to go.

Charles Miller · Answer 2 · 2024-11-20T04:35:45.797

find /v "" sourcefile > destinationFile

this will read the contents of a sourcefile, and print any line that DOES NOT match "" (nothing) - thereby printing the contents of the entire file.

the find command seems to parse UTF-16 fine for me, and also happens to output plain ascii, so, your destination file will contain the same text as source, but will be ascii.

Edit: addressing @ellen22's comment about getting rid of the undesirable output of the find command - just execute from a for loop and skip those lines: ex:

  for /f "skip=2 usebackq" %%A in (`find /v "" sourcefile`) do @(echo %%A >> destinationFile)

caveat: batch will now open the file, write to it, and close it for each line. To speed this up, put it all in its own block:

(
  for /f "skip=2 usebackq" %%A in (`find /v "" sourcefile`) do @(echo %%A)
) > destinationFile
now batch will "unfold" the whole for-loop before writing to the file. faster!

score 2 · Answer 3 · answered May 30 '23 at 07:59

The type command will work if the UTF16 file does not contain a BOM:

type utf16.txt >ascii.txt

But as in your case the generated file does have a BOM, a sure-fire method for converting the file uses PowerShell:

powershell "Get-Content 'utf16.txt' | Out-File 'ascii.txt' -Encoding ascii"

Notice the use of two types of quotes to avoid the need to escape the inner quotes.

score 0 · Answer 4 · answered Nov 15 '23 at 03:05

For the "add missing BOM" option: I don't have 7, but in 8.1 (or 10):

open notepad, don't enter anything, and save as Unicode (UTF16LE in 10); this creates a file containing only littleendian BOM
copy bomfile+bomless_utf16le newfile

The result works for me with type and powershell get-content.

But it's not as devious as Charles' find /v ""!

Convert UTF-16 LE to UTF-8 in windows via command line

4 Answers4