I have a multiline $string variable that contains UTF-8 csv. I open this string as a file for processing and print its contents.
open(my $fh, "<", \$string);
$/=undef;
say <$fh>;
With hexdump I see the text is UTF-8 (É is c3 89).
Now I read the string through Text::CSV.
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 });
my $line;
$csv->say(\*STDOUT, $line) while ($line = $csv->getline($fh));
É char has become c9 (Unicode?). If I print that to my console I'm getting � instead of É.
I use perl 5.28.0.
Why is Text::CSV altering encoding and how to avoid it?
EDIT
I've made progress, thanks to @Gilles Quénot and @ikegami, and some trial and error.
What happened is that Text::CSV converted my strings into perl internal format. Strings in perl's internal format won't be output correctly to my utf8 terminal unless I use open ':std', ':encoding(UTF-8)';. This directive is apparently needed in my program main file only.
Another problem I had (absent from my example) was that I needed use utf8 in all source files to convert my program literals into perl internal format. Without it, comparisons such as "É" eq $some_var  fail because the former will be utf8 (because of my editor saving to that format) and the latter will be in perl's internal format.
Another problem I encountered was stacked decoding. Once use open ':std', ':encoding(UTF-8)'; is in place, any other encoding instruction must be removed from the program (the symptom I had: chars output as 4 bytes instead of 2).
EDIT 2
Here are simple tests that really helped me understand.
# no conversion to internal perl string format
$ perl -M'5.28.0' -e 'say "É"' | hexdump -C
00000000  c3 89 0a                                          |...|
00000003
# string literals converted to perl string format,
# but no conversion of output to terminal
# results in �
$ perl -Mutf8 -M'5.28.0' -e 'say "É"' | hexdump -C
00000000  c9 0a                                             |..|
00000002
# string literals converted to perl string format,
# AND conversion of output
$ perl -Mutf8 -M'open ":std", ":encoding(UTF-8)"' -M'5.28.0' -e 'say "É"' |hexdump -C
00000000  c3 89 0a                                          |...|
00000003
And finally
# entirely transparent because input is decoded 
# and reencoded on output
# use utf8 has no effect in this very basic example
$ echo É | perl -Mutf8 -M'open ":std", ":encoding(UTF-8)"' -M'5.28.0' -pne '' |hexdump -C
00000000  c3 89 0a                                          |...|
00000003
We have to assume strings are converted to perl internal format at some point.
 
     
    