in reply to XML::parser question

This snippet uses XML::Simple to solve your problem as I understand it.

use XML::Simple; my $monk = XMLin('./monk.xml', keyattr => {val => 'val1'}, forcearray => ['val'] ); if($monk->{vals}->{val}->{FOO}) { print $monk->{say}, "\n"; }

That's not to say XML::Simple will handle all your XML parsing needs but it can do the simple stuff. For more complex stuff the options boil down to using a SAX parser or an XPath-capable DOM module (or XML::Twig for a more Perlish API).

As you found, maintaining state when you use XML::Parser's handler API involves storing stuff in global variables (or using a closure if you're that way inclined). This is one of the reasons that the XML::Parser API is effectively deprecated in favour of XML::SAX which uses a cleaner object oriented style. However, with SAX you'll have the same problem you found with XML::Parser - having to write pages of code to perform a seemingly simple task.

The other option is to use XML::XPath or XML::LibXML to slurp the XML file into a DOM tree which you can query with XPath statements. eg:

use XML::XPath; use XML::XPath::XMLParser; my $xp = XML::XPath->new(filename => './monk.xml'); my $nodeset = $xp->find('/monk[./vals/val[@val1 = "FOO"]]'); foreach my $node ($nodeset->get_nodelist) { my $say = $xp->findvalue('./say', $node); print "$say\n"; }

XPath is kind of a regex for XML syntax. The $xp->find statement returns a list of nodes which match the XPath expression (in this case all 'monk' elements which contain a 'val' element in a 'vals' element, where the 'val' element has a 'val1' attribute containing the string 'FOO'). The $xp->findvalue is then used to extract the contents of the returned node's 'say' child element.

For more info, see the Perl-XML FAQ.

Using regexes is not a great idea because although you can create something that works in simple cases it's really hard to cover all your bases (eg: what if the XML contains a numeric character entity or is UTF-16 encoded). To see just how hard it is to do it right, take a look at the source of XML::SAX::PurePerl.

Replies are listed 'Best First'.
Re: Re: XML::parser question
by Hrunting (Pilgrim) on Oct 24, 2002 at 12:40 UTC
    It's important to note that XML::Simple has been ported to the XML::SAX API (as XML::SAX::Simple). This means you can get the ease of use of XML::Simple with the benefits of XML::SAX. One of the chief benefits is that you can use other parsers besides expat (the parser XML::Parser uses). expat is notoriously inefficient as an XML parser. libxml2 (the C library at the core of XML::LibXML) is much faster, and has already been ported to the XML::SAX API. So if you used XML::LibXML as your parsing library and XML::SAX::Simple as your parser, you could get a very fast, very easy solution to this problem.

      Actually, you don't need XML::SAX::Simple - matts put that quick hack together while I was integrating SAX support directly into XML::Simple. Version 1.08_01 of XML::Simple supports SAX natively. It can act as a handler in the way you suggest, it can also drive a SAX pipeline from a Perl data structure and it can do both at the same time for filtering.

        Ahh, that's great. Release it on CPAN so the rest of the world can view it. All I see is 1.08.