in reply to Re: XML::Parser::Expat Question
in thread XML::Parser::Expat Question

Interesting you mentioned that entities in attributes silently disappear...it was going to be my next point. The XML I'm parsing has a lot information in the attributes that I need and most of the attributes have entities I need resolved.

Let me think about what to do. I already have most of the parsing code written and my deadline is coming up soon....so I don't know if I'm going to be able to switch over to XML::Twig in time. I will look over the XML::Twig mod and learn more about it. I do have a solution in place to that opens the XML file before the Expat parse and resolves the entities before hand. I don't like to do that, but it solves my problem.

Thanks for all your help. Greatly appreciated! I'm been pondering a solution to this problem within Expat for some time. Since Expat provided a ExternEnt Handler I was assumed I was doing something wrong! What would you use the ExternEnt handler for then???

Thanks again!!!

Adam

Replies are listed 'Best First'.
Re^3: XML::Parser::Expat Question
by mirod (Canon) on Aug 18, 2004 at 22:46 UTC

    If I understand XML::Parser, the ExternEnt handler is used for entities that refer to external files, but I don'think there is any built-in way to get to the DTD, and to the info inside it.

    Actually if I read the code in XML::Twig properly (I wrote it quite a while ago), it just parses the DTD with a dummy document, gets the entity info, and uses it later when parsing the main document. And "I don't like to do that, but it solves my problem" ;--(

    About entities in attributes: the Default handler is properly called when an entity is found in an attribute value, but the problem is that you can't do much at this point, and when the Start handler is called, the entity has disapeared from the attribute value that gets passed to it. Which is really annoying, especially as the default entities ('&', '<', '>'...) get properly replaced.

    For example this is scary, and shows that there isn't much that can be done that will work in all cases:

    #!/usr/bin/perl -w use strict; use XML::Parser; XML::Parser->new( Handlers => { Start => sub { print "att: '$_[3]'\n" +} }) ->parse( '<!DOCTYPE doc SYSTEM "dummy"><doc att="an &ent; a +nd an &amp;ent;"/>'); # prints att: 'an and an &ent;'
      Sorry...didn't mean to sound like that. This is all very helpful and enlightening. I will check out the twig code...I'm sure it will be helpful. Much appreciated! Adam