I am writing a Perl script that needs to extract some data from an XML file.
The XML file itself is encoded using UTF-8. For some reason, however, what I extract from the file ends up being encoded as ISO-8859-1. The documentation states that whatever is passed to my handlers should be UTF-8, but it just isn't.
The parser is basically something like this:
my $parser = XML::Parser->new( Handlers => {
    # Some unrelated handlers here
    Char => sub {
        my ( $expat, $string ) = @_;
        if ( exists $data->{$curId}{$curField} ) {
            $data->{$curId}{$curField} .= $string;
        } else {
            $data->{$curId}{$curField} = $string;
        }
    } ,
} );
I have tried the following variants for actually parsing:
- file parsed directly through $parser->parsefile, no options;
- file parsed directly through $parser->parsefile, with theProtocolEncodingoption;
- file opened using open( $handle , "<file.xml" )then parsed through$parser->parse;
- file opened using open( $handle , '<:utf8' , "file.xml" )then parsed through$parser->parse.
In addition, I have tried each version with and without the <?xml encoding="utf-8"?> header in the file.
In all cases, what ends up in $data->{$curId}{$curField} is encoded using ISO-8859-1.
What am I doing wrong?
 
     
     
    