carcassonne has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Anyone out there using XML::Twig successfully in reading XML entities ? What I have is along the following lines. A version var is stored in an entities file, and a reference is made is a main XML file:

main XML file:

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ <!ENTITY % component-entities SYSTEM "component.ent"> %general-entities; ]> <component>Engine-&engine-version;</component>

Whereas the entities file has:

<!ENTITY engine-version "3.2">

I've added a Dumper on the file loaded by XML::Twig and the version for instance is not interpreted in memory.

Is it possible to access these values using Twig ? If not, which XML module do you use to achieve this in perhaps the most simplest way?

Cheers.

Replies are listed 'Best First'.
Re: XML::Twig and ENTITY declarations
by carcassonne (Pilgrim) on Mar 18, 2009 at 00:01 UTC
    Looking at it a bit further, and adding the expand_external_ents option, the following error is returned:

    syntax error at line 65, column 0, byte 2841: <![%sgml.features;[ ^ at /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/P +arser.pm line 188 at /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/XML/P +arser/Expat.pm line 469

    So I've looked in Parser.pm at line 188 and put a debug statement to see what could be $arg:

    eval { print "DEBUG: arg: $arg\n"; $result = $expat->parse($arg); };

    And $arg is in fact a file that starts like:

    <!DOCTYPE sect1 [<!-- ................................................ +...................... --> <!-- DocBook XML DTD V4.5 ............................................ +..... --> <!-- File docbookx.dtd ............................................... +..... --> <!-- Copyright 1992-2006 HaL Computer Systems, Inc., O'Reilly & Associates, Inc., ArborText, Inc., Fujitsu Software

    And at line 65 of that file is the sgml.features line:

    <!-- Enable SGML features ............................................ +..... --> <!ENTITY % sgml.features "IGNORE"> <![%sgml.features;[ <!ENTITY % xml.features "IGNORE"> ]]>

    Doing a /usr-wide serach I found that there are several of these files in /share/sgml/docbook/, for instance:

    ./share/sgml/docbook/xml-dtd-4.5-1.0-33.fc8/docbookx.dtd

    Could it be that the XML::Twig error is about such a file being broken ? If I do not care at this stage for any DOCTYPE declaration, can I still use 'straight' ENTITY declarations (w/o the DOCTYPE) so that XML::Twig can actually process them ?

    Does this make any sense ;-) ?

Re: XML::Twig and ENTITY declarations
by mirod (Canon) on Mar 18, 2009 at 05:34 UTC

    Well, if you define component-entities and then you use general-entities, any software will have trouble figuring out where to get the value you want.

    Then indeed you need to remove the doctype declaration, which is probably a bug in XML::Twig, I have to check some more.

    Once that's done, you need to use the parse_param_ent option to get the value to be read. That option is undocumented because it's inherited directly from XML::Parser. I'll add the doc about it in XML::Twig.

    Once that's done, your file looks like this:

    <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE sect1 [ <!ENTITY % component-entities SYSTEM "component.ent"> %component-entities; ]> <component>Engine-&engine-version;</component>

    and you can see how it's processed by doing this:

    perl -MXML::Twig -e'XML::Twig->new( parse_param_ent => 1)->parsefile( "ent1.xml")->print'
      Hi, the code is actually more than this and I've made a mistake when cutting down an example. So no, I do define and use the same.

      Thanks for pointing out the use of parse_param_ent ! Now it works as it should, and access to ENTITY variables is easy. Yes, it ought to be found in the XML::Twig docs.

      I've noticed that seemingly it cannot be used along with keep_encoding. In my case, that's fine and I won't bother with using both.

      On the other hand, the actual DOCTYPE declaration is not handled at all. I have to remove it (eg. sed it out) from the XML files. That could be a normal thing to do in a way since it directs to an external network resource. I'm not a XML expert.

      The following:

      <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ <!ENTITY % component-entities SYSTEM "components.ent"> %component-entities; ]>

      Yields the following error:

      cannot expand > - cannot load 'http://www.oasis-open.org/docbook/xml/4 +.5/docbookx.dtd' at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thr +ead-multi/XML/Parser/Expat.pm line 469

      Whereas the following is all OK:

      <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE sect1 [ <!ENTITY % component-entities SYSTEM "components.ent"> %component-entities; ]>

      Thanks.

Re: XML::Twig and ENTITY declarations
by Anonymous Monk on Mar 18, 2009 at 04:59 UTC
      Anonymous, thanks for the complete test example - it's very much appreciated.