in reply to XML::SAX::PurePerl, handle entities

this seems to be resolved to <tag><SOMETHING></tag>

This makes no sense. There's no way it would return that as text. You should get at least 3 callbacks: element start, one or more string of characters, element end. And you do:

use strict; use warnings; BEGIN { package MySAXHandler; use parent 'XML::SAX::Base'; sub start_element { print "element $_[1]{Name}\n"; } sub end_element { print "element end\n"; } sub characters { print "text $_[1]{Data}\n"; } } #use XML::SAX; use XML::SAX::PurePerl; #my $parser = XML::SAX::ParserFactory->parser( my $parser = XML::SAX::PurePerl->new( Handler => MySAXHandler->new(), ); $parser->parse_uri("foo.xml");
<root><tag>&lt;SOMETHING&gt;</tag></root>
element root element tag text < text SOMETHING text > element end element end

It sounds like you're building XML from text without converting the text to XML.

Because you blamed the module instead of showing your broken code, I have no time to look into whether the parser is capable of handling DTDs or provide a implicit mechanism for entity definition.

Replies are listed 'Best First'.
Re^2: XML::SAX::PurePerl, handle entities
by VirtualRider (Initiate) on Aug 12, 2010 at 17:17 UTC

    I am not blaming the module and i'm afraid i can't show you my 'broken code' but this was how i ran into the lt-gt issue (that i can avoid):

    $parser->parse_string("<?xml version=\"1.0\"?> <!DOCTYPE root [ <!ENTITY lt \"<\"> <!ENTITY gt \">\"> ]> <root>&lt;SOMETHING&gt;</root>");

    The XML i'm working on is well-formed, weird and kinda big. Currently i'm facing a "End tag mismatch"-error, but havn't figured out what's causing it

      That code doesn't run. Something about $parser not having a value. You'd think it would be a XML::SAX::PurePerl object, but it doesn't have a parse_string method. What are you talking about?!

      And why are you trying to redefine &lt; and &gt;? Those are two of the four entities XML understand nativly. I've even shown that XML::SAX::PurePerl handles them.

      Update: Must have run a bad test. It does indeed have parse_string.

        parse_string is similar to parse_uri - if i remember correctly they are defined by XML::SAX::Base. I don't wanna redefine lt and gt, they were just in a set of entities i need to replace/handle and i removed the native-xml-entities from that set so this isn't a problem.

        If there is no way to send unkown entities to an handler/method, i need to generate the doctype-entity-section at runtime and deal with them after parsing - i was just looking for a way to do this at parse-time/avoid the doctype-section-generation

        The end-tag-mismatch is a major problem and could have something to do with the size of the xml file (the tags are correct), but i wont be able to have a look on it until tomorrow

        #! /usr/bin/perl use strict; use warnings; BEGIN { package MySAXHandler; use parent 'XML::SAX::Base'; sub start_element { print "element $_[1]{Name}\n"; } sub end_element { print "element end\n"; } sub characters { print "text $_[1]{Data}\n"; } } use XML::SAX::PurePerl; my $parser = XML::SAX::PurePerl->new( Handler => MySAXHandler->new(), ); $parser->parse_string("<?xml version=\"1.0\"?> <!DOCTYPE root [ <!ENTITY lt \"<\"> <!ENTITY gt \">\"> ]> <root>&lt;SOMETHING&gt;</root>");