in reply to Re^2: Encode throws "Wide character in subroutine entry" when using XML::Simple
in thread Encode throws "Wide character in subroutine entry" when using XML::Simple

That is as counterintuitive as a thing can be. You have a file that is encoded in UTF-8 and has a UTF-8 byte order mark in it, yet to solve a problem with it not being interpreted properly as UTF-8 text by a module, you have to use binmode, not the proper UTF-8 encoding layer :encoding(UTF-8). It just doesn't make sense. Who would intuit that? Obviously, not I. :-(

Replies are listed 'Best First'.
Re^4: Encode throws "Wide character in subroutine entry" when using XML::Simple
by ikegami (Patriarch) on Dec 12, 2010 at 01:05 UTC

    The encoding of the document is specified in the document, not externally (e.g. the system's locale or an HTTP header). Determining the encoding requires parsing the document, so it's up to the XML parser to do the decoding. This is why XML is considered a binary (application/) format, not a text (text/) format.

    If it was up to the caller to decode the content as you claim, the caller would have to parse the XML to determine the encoding before passing the XML to the parser. That's what makes no sense.

    yet to solve a problem with it not being interpreted properly as UTF-8 text by a module

    That's not true. It is expected to be UTF-8 by the module and treated as such. The problem is that by decoding the text, you're passing text that's not encoded using UTF-8 anymore.