VirtualRider has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

i'm using the XML::SAX::PurePerl-parser because i'm not able to use anything other than pure perl.

Is there a way to send unkown entities to an default handler like it is possible with XML::Parser? I need to do some more complicated tasks with those entities than replacing it with text.

Right now i'm also stuck at a weird problem: My XML file contains <tag>&lt;SOMETHING&gt;</tag> this seems to be resolved to <tag><SOMETHING></tag> and leads to an 'Invalid element name'-error. I can't believe this is the proper behavior - any suggestions?

Feel free to ask if i should point something out in more detail.

Thank you

VR

UPDATE: I accidentally added &lt; and &gt; to the dtd-entities. This caused the 'Invalid element name'-error.

I think it could work, but it's rather inconvenient to generate <!ENTITY entname "replacement"> for each of my custom-entities

Replies are listed 'Best First'.
Re: XML::SAX::PurePerl, handle entities
by ikegami (Patriarch) on Aug 12, 2010 at 16:02 UTC

    this seems to be resolved to <tag><SOMETHING></tag>

    This makes no sense. There's no way it would return that as text. You should get at least 3 callbacks: element start, one or more string of characters, element end. And you do:

    use strict; use warnings; BEGIN { package MySAXHandler; use parent 'XML::SAX::Base'; sub start_element { print "element $_[1]{Name}\n"; } sub end_element { print "element end\n"; } sub characters { print "text $_[1]{Data}\n"; } } #use XML::SAX; use XML::SAX::PurePerl; #my $parser = XML::SAX::ParserFactory->parser( my $parser = XML::SAX::PurePerl->new( Handler => MySAXHandler->new(), ); $parser->parse_uri("foo.xml");
    <root><tag>&lt;SOMETHING&gt;</tag></root>
    element root element tag text < text SOMETHING text > element end element end

    It sounds like you're building XML from text without converting the text to XML.

    Because you blamed the module instead of showing your broken code, I have no time to look into whether the parser is capable of handling DTDs or provide a implicit mechanism for entity definition.

      I am not blaming the module and i'm afraid i can't show you my 'broken code' but this was how i ran into the lt-gt issue (that i can avoid):

      $parser->parse_string("<?xml version=\"1.0\"?> <!DOCTYPE root [ <!ENTITY lt \"<\"> <!ENTITY gt \">\"> ]> <root>&lt;SOMETHING&gt;</root>");

      The XML i'm working on is well-formed, weird and kinda big. Currently i'm facing a "End tag mismatch"-error, but havn't figured out what's causing it

        That code doesn't run. Something about $parser not having a value. You'd think it would be a XML::SAX::PurePerl object, but it doesn't have a parse_string method. What are you talking about?!

        And why are you trying to redefine &lt; and &gt;? Those are two of the four entities XML understand nativly. I've even shown that XML::SAX::PurePerl handles them.

        Update: Must have run a bad test. It does indeed have parse_string.