This is a follow up question to the one yesterday (XML Simple Charset Q?) about parsing data.

As well as some latin-1 accented characters, I also have some (valid) html entities such as ± ( ± ) and ≤ ( ≤ ). Unfortunately, no matter what I do, my XML::Parser always barfs on these entities. I've changed the charset I use in the file or passed as a raw parameter (or even when I'm playing with XML::Twig using keep_encoding) and changed the top level XML package (either XML::Simple or XML::Twig) to no avail.

My HTML::Entities correctly recognised and converts the encodings so that's presumably not the issue.

Is this an XML Parser bug (and if so or is it due to anold version of XML::Parser)? or am I just completely misinderstanding something? or? I would greatly prefer not to have to manually convert these entities before handing off to the parser and ideally I'd like them untouched since they don't need to be changed by any of the parsers.

Versions
HTML::Entities Version: 1.23;
XML::Parser Version: 2.27;

Sample always failing code

use Data::Dumper; use XML::Simple; my $xml= <<EOXML ; <rec id = 'F600' type = 'J'> <author>A. S. B&#245;mmarius, K. Drauz, W. Hummel, M.-R. Kula, C. Wand +rey</author> <text>&lt; &amp; &ge; &#242; &plusmn;</text> </rec> EOXML $xmlref = XMLin($xml);
Error: undefined entity at line 3, column 17, byte 129 at c:/Perl/site/lib/XML/Parser.pm line 168
Line 3 col 17 appears to be the &ge;

Dingus
PS This entity and accent crud is almost enough to make me use regexes for XML parsing :)


Enter any 47-digit prime number to continue.

In reply to XML::Parser and &entity; by dingus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.