Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to parse xml documents that may contain the pound sterling symbol '£' (#163; can't be used). XML::Parser dies when it encounters these, even though the parse option ProtocolEncoding is set to UTF8 (UTF-8 doesn't work either). The xml documents also have the encoding set to utf-8. Can this be done? Hope you can help.

Replies are listed 'Best First'.
Re: xml::parser support for utf8
by mirod (Canon) on May 31, 2001 at 17:30 UTC

    Are you sure the pound is UTF-8 encoded? Depending on the software that created the XML document it might well be in latin-1, in which case you should set ProtocolEncoding to 'ISO-8859-1' (or even better declare the encoding in the XML declaration, as it is the XML document might not be well-formed if it has no encoding declaration but is not encoded in UTF-8 or UTF-16).

    UTF-8 is the default encoding for expat and XML::Parser so you don't have to specify it using ProtocolEncoding, and if the parser dies on a special symbol it is usually because it is encoded in something else, usually 'ISO-8859-1' if you use western software (and use the pound symbol ;--)