1

I have a perl script that produces utf8 output. I tried using Set-Content to write a utf8 file as suggested by Powershell overruling Perl binmode?.

perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding Byte

produces the error

"Set-Content : Cannot proceed with byte encoding. When using byte encoding the content must be of type byte."

perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8

doesn't produce an error message, but it doesn't write a correct utf8 file either.

The output of the perl script is displayed correctly in the Powershell window. What is the correct way to write that output to a utf8-encoded file?

Thanks.

Update: I have seen many responses to this and similar problems, here at the link referenced above, and at https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8. None of them appear to work, leading me to believe that not one has actually been tested. A tested method for redirecting UTF8 text output from a CLI program to a file is desired. Thanks.

Here is the perl test script:

use strict;
use warnings;
use utf8;
binmode(STDOUT, ":utf8");
print("The Crüxshadows");

1 Answers1

0

Make sure PowerShell uses UTF-8 when communicating with external programs. (The built-in cmdlets already default to UTF-8.) This requires setting [console]::InputEncoding and [console]::OutputEncoding to UTF-8.

On my Windows 10 system, PowerShell uses Code Page 437 by default:

PS C:\Users\Me> [Console]::OutputEncoding

IsSingleByte : True EncodingName : OEM United States WebName : ibm437 HeaderName : ibm437 BodyName : ibm437 Preamble : WindowsCodePage : IsBrowserDisplay : IsBrowserSave : IsMailNewsDisplay : IsMailNewsSave : EncoderFallback : System.Text.InternalEncoderBestFitFallback DecoderFallback : System.Text.InternalDecoderBestFitFallback IsReadOnly : False CodePage : 437

We fix this for the current PowerShell session with this command:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

(See the above-linked github.com issue for ways to persist this change.)

PS C:\Users\Me> [Console]::OutputEncoding

Preamble : BodyName : utf-8 EncodingName : Unicode (UTF-8) HeaderName : utf-8 WebName : utf-8 WindowsCodePage : 1200 IsBrowserDisplay : True IsBrowserSave : True IsMailNewsDisplay : True IsMailNewsSave : True IsSingleByte : False EncoderFallback : System.Text.EncoderReplacementFallback DecoderFallback : System.Text.DecoderReplacementFallback IsReadOnly : False CodePage : 65001

Windows 7 and later, i.e. all supported Windows versions, have codepage 65001, as a synonym for UTF-8
-- https://en.wikipedia.org/wiki/UTF-8

Now your script works as expected.

perl .\testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8

Successfully tested on PowerShell 5.1 and 7.1.

If you prefer BOM-less:

perl .\testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8NoBOM

Successfully tested on PowerShell 7.1. (The UTF8NoBOM encoding was introduced in PowerShell 6.)