XML::Parser requires an XML document. It honours BOMs and the encoding attribute. I confirmed this by testing.

But you don't pass it an XML document. An XML document is a collection of bytes, but you decoded the XML document into characters. It's the parser's job to decode the values it returns using the encoding specified inside the document, so you need to avoid removing any character encoding. Fix:

# Remove Content-Encoding (e.g. compression), # but leave document as bytes. my $xml = $response->decoded_content( charset => 'none' );

Why did your code work if it was buggy? Because there's also a bug in XML::Parser::Expat. Expat incorrectly uses Perl's internal representation of the string as the XML document instead of using the contents of the string. Most of the time, your bug and this bug cancel out to produce the right output.

Here's the workaround for the bug in XML::Expat (does nothing most of the time):

# Expat expects the string to use this internal format. utf8::downgrade($xml) if $] ge '5.008'; $p->parse($xml);

In reply to Re: LWP::Agent vs. XML::Parser - the zillionth encoding madness question by ikegami
in thread LWP::Agent vs. XML::Parser - the zillionth encoding madness question by .rhavin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.