Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello
My code:
#! /usr/bin/perl use strict; use XML::Twig; my $gobjXML; my $ghshAvail = {}; eval { $gobjXML = XML::Twig->new( twig_handlers => { availability_messages => \&xxx } ); $gobjXML->parsefile("test.xml"); $gobjXML->purge(); $gobjXML->dispose(); undef($gobjXML); }; sub xxx { my($objt,$objnode) = @_; my $exp = qq(./message[\@id]); foreach my $node ($objnode->get_xpath($exp)) { if( ! defined($ghshAvail->{$node->{att}->{'id'}})) { $ghshAvail->{$node->{att}->{'id'}} = $node->te +xt(); } print "The attrib is " . $node->{att}->{"id"} . " the +value is " . $node->text() . "\n"; } }
The XML:
<?xml version="1.0"?> <xmlfeed> <availability_messages> <message id="a1"><![CDATA[temporarily out of stock - will ship + in 1-2 weeks]]></message> <message id="a2"><![CDATA[This item is temporarily<br> out of +stock and will<br> ship in approximately<br> <b>1 - 2 weeks.</b> <fon +t size=1><a href="http://www.drugstore.com/cat/10661/tmpl/default.asp +?catid=15604#15448">learn more</a></font>]]></message> <message id="a3"><![CDATA[currently in stock]]></message> <message id="a4"><![CDATA[* This item is currently <b>in stock +.</b> *]]></message> <message id="a5"><![CDATA[temporarily out of stock - will ship + in 2-4 weeks]]></message> <message id="a6"><![CDATA[This item is temporarily<br> out of +stock and will<br> ship in approximately<br> <b>2 - 4 weeks.</b> <fon +t size=1><a href="http://www.drugstore.com/cat/10661/tmpl/default.asp +?catid=15604#15448">learn more</a></font>]]></message> <message id="a7"><![CDATA[temporarily unavailable from manufac +turer - will ship in 4-6 weeks]]></message> <message id="a8"><![CDATA[This item is currently unavailable<b +r> from the manufacturer and will<br> ship in approximately <b>4 - 6 +weeks.</b><br> <font size=1><a href="http://www.drugstore.com/cat/10661/tmpl/default.asp?catid=15604# +15448">learn more</a></font>]]></message> </availability_messages> </xmlfeed>
The above mentioned code and the sample XML file are cut from the production code,where the process parses a XML which is really huge in size. The above code works right with the sample XML attached. The issue is when I run the same code with the larger XML where I have the above mentioned elements plus lots of other stuff , the value for the attribute "id" are swapped or random. That is once i do an xpath like "./message@id", I get "currently in stock" as the value for "a1" and "a3" gets "temporarily out of stock..." . Do you know if there is any issue with XML twig with this kind of situation.

Edited by castaway: added code tags.

Replies are listed 'Best First'.
Re: XML::Twig issues
by mirod (Canon) on Sep 22, 2003 at 20:00 UTC

    Well, it is going to be hard for me to debug your code if you give me data that doesn't exhibit the problem you describe. You should really try to extract the smallest data set that causes the problem and work from there.

    I have a few comment son your code, but nothing that should prevent it from working:

    • instead of wrapping the creation of the twig and the parse in an eval, you could create the twig normally, then use safe_parsefile, and test $@ to see if the parse succeeded,
    • if you use a recent (3.00 or above) version of XML::Twig and of Perl (5.6.0 or above) with Scalar::Util then the purge/dispose/undef dance is essentially useless. At the end of the block the object is destroyed. Correct me if I am wrong and I will fix it ;--)
    • $objnode->children( q{message[@id]}) is probably faster, and simpler, than using a full blown $objnode->get_xpath( q(./message[@id])),
    • nothing in the docs tells you that $node->{att}->{'id'} is a valid way to access the attributes id. Only $node->att( 'id') is garanteed to work in future releases (sorry, that's a pet peeve of mine, I know that the doc is quite overwhelming and that the attributes are indeed stored in a hash, but users breaking the OO model here prevents me for example to pool attribute values, which could decrease the memory needed to process some documents).