dingus has asked for the wisdom of the Perl Monks concerning the following question:
As well as some latin-1 accented characters, I also have some (valid) html entities such as ± ( ± ) and ≤ ( ≤ ). Unfortunately, no matter what I do, my XML::Parser always barfs on these entities. I've changed the charset I use in the file or passed as a raw parameter (or even when I'm playing with XML::Twig using keep_encoding) and changed the top level XML package (either XML::Simple or XML::Twig) to no avail.
My HTML::Entities correctly recognised and converts the encodings so that's presumably not the issue.
Is this an XML Parser bug (and if so or is it due to anold version of XML::Parser)? or am I just completely misinderstanding something? or? I would greatly prefer not to have to manually convert these entities before handing off to the parser and ideally I'd like them untouched since they don't need to be changed by any of the parsers.
Versions
HTML::Entities Version: 1.23;
XML::Parser Version: 2.27;
Sample always failing code
Error: undefined entity at line 3, column 17, byte 129 at c:/Perl/site/lib/XML/Parser.pm line 168use Data::Dumper; use XML::Simple; my $xml= <<EOXML ; <rec id = 'F600' type = 'J'> <author>A. S. Bõmmarius, K. Drauz, W. Hummel, M.-R. Kula, C. Wand +rey</author> <text>< & ≥ ò ±</text> </rec> EOXML $xmlref = XMLin($xml);
Dingus
PS This entity and accent crud is almost enough to make me use regexes for XML parsing :)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::Parser and &entity;
by mirod (Canon) on Nov 26, 2002 at 16:45 UTC | |
by dingus (Friar) on Nov 26, 2002 at 18:00 UTC | |
by mirod (Canon) on Nov 26, 2002 at 19:12 UTC | |
by mirod (Canon) on Nov 26, 2002 at 17:46 UTC | |
|
Re: XML::Parser and &entity;
by Anonymous Monk on Nov 26, 2002 at 16:54 UTC |