in reply to Re: Encode throws "Wide character in subroutine entry" when using XML::Simple
in thread Encode throws "Wide character in subroutine entry" when using XML::Simple

Backwards. That would cause the problem.
open(my $fh, '<:encoding(UTF-8)', $file);
should be
open(my $fh, '<', $file); binmode($fh);

Replies are listed 'Best First'.
Re^3: Encode throws "Wide character in subroutine entry" when using XML::Simple
by Jim (Curate) on Dec 12, 2010 at 00:39 UTC

    That is as counterintuitive as a thing can be. You have a file that is encoded in UTF-8 and has a UTF-8 byte order mark in it, yet to solve a problem with it not being interpreted properly as UTF-8 text by a module, you have to use binmode, not the proper UTF-8 encoding layer :encoding(UTF-8). It just doesn't make sense. Who would intuit that? Obviously, not I. :-(

      The encoding of the document is specified in the document, not externally (e.g. the system's locale or an HTTP header). Determining the encoding requires parsing the document, so it's up to the XML parser to do the decoding. This is why XML is considered a binary (application/) format, not a text (text/) format.

      If it was up to the caller to decode the content as you claim, the caller would have to parse the XML to determine the encoding before passing the XML to the parser. That's what makes no sense.

      yet to solve a problem with it not being interpreted properly as UTF-8 text by a module

      That's not true. It is expected to be UTF-8 by the module and treated as such. The problem is that by decoding the text, you're passing text that's not encoded using UTF-8 anymore.

Re^3: Encode throws "Wide character in subroutine entry" when using XML::Simple
by nglenn (Beadle) on Dec 12, 2010 at 01:36 UTC

    Nope. This still throws the same errors:

    open(my $fh, '<', $file); binmode($fh); my $xml = XML::Simple->new(); $self->{ettx} = $xml->XMLin($fh, ForceArray => ['map'], KeyAttr => {}, ) ->{table};

      That code doesn't run. Please provide the code you actually ran. For starters, I don't know if you've removed

      local $XML::Simple::PREFERRED_PARSER = 'XML::Parser';

      or not.

        Sorry, I only sent a clip of a rather large program.

        I didn't remove the preferred parser setting.

        Here's a sample xml file:

        <?xml version="1.0" encoding="utf-8" standalone="yes"?> <ettx ver="2"> <table id="{4fa6cd7a-f7b6-416d-8f59-3acc0eab9bdb}" name="TestFile"> <level type="V"> <map sync="Title" src="some unicode chars"/> </level> </table> </ettx>

        Here's the code where the XML parsing takes place:

        package ETTX; use strict; use warnings; use XML::Simple; local $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; sub new(){#scalar file name my $class = shift; my $self = { ettxFile => '', ettx => {}, }; bless $self, $class; load($self,shift) if @_ ==1; return $self; }; sub load(){#scalar file name my ($self, $ettxFile) = @_; print "loading $ettxFile"; open(my $fh, '<', $ettxFile); binmode($fh); my $xml = XML::Simple->new(); $self->{ettx} = $xml->XMLin($fh, ForceArray => ['map'], KeyAttr => {}, ) ->{table}; $self->{ettxFile} = $ettxFile; 1; }

        I call it from somewhere else like this:

        my $ettx = ETTX->new(); $ettx->load($ettxFile);

        utf8 with BOM kills it; without is fine.