MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monkers,

I am trying to devise a way of getting information from an XML document. The document in question is created dynamically when a user chooses specific search criteria on, say, a web page. Though i have no idea what crieteria they may have chosen. (Its for a web service if that helps).

What i can't figure out is how to get a list of all the nodes that are present in the document and then locate the one im after, without having to type each possible combination in. I know the node i want to search for, but, it may be nested deep within some other nodes, or may be at the very top. I just dont know.

I have looked at XML::Simple, XML::Parser, XML:Smart, but im still not clued up.

Any help would be realy great.
MonkPaul

Replies are listed 'Best First'.
Re: Unknown number of XML nodes
by davorg (Chancellor) on Nov 29, 2005 at 13:07 UTC

    Sounds like you want XML::XPath (or, alternatively, XML::LibXML which also has XPath support).

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Unknown number of XML nodes
by loris (Hermit) on Nov 29, 2005 at 13:44 UTC

    I would use XML::LibXML. Once you have parsed the XML and got hold of the root node, you can then do something like:

    my $name = 'NameOfTheNodeYouAreInterestedIn'; my @matchingNodes = $rootNode->findnodes('//' . $name);

    Of course the XPath argument to findenodes may be a lot more complicated than my example.

    HTH,

    loris


    "It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ."
Re: Unknown number of XML nodes
by mirod (Canon) on Nov 29, 2005 at 14:09 UTC

    OK, I'll bite ;--)

    As usual with XML you have 3 options:

    • use a module that loads the document in memory and then find the one you want, most likely using XPath (you might need to build the XPath query from the "specific search criteria" you mentionned). XML::LibXML is probably your best bet, it is the most powerful and fastest of the lot,
    • use a module that processes the XML stream, XML::Parser or, better XML::SAX (better because the SAX interface is a standard, it is actually LESS powerful than XML::Parser's interface, also there are a good number of SAX based modules that can make your life easier, such as XML::SAX::Machines). In this case you will have to build the code that finds the node you are looking for, and of course you will be limited by the streaming nature of the process (no lookahead unless you code it, need to manage the context to figure out the ancestors/previous siblings of the current node). In short, a big PITA ;--)
    • XML::Twig (totally biased advice here ;--) finally, will let you use an XPath-inspired language to filter the input, and find the node you want. In this case also you are limited by the streaming nature of the filtering, although probably less than with a pure SAX module (you can also load the entire document in memory, and use XPath expressions to find the node you want).

    The use of XPath will really make your work much easier, so you should really forget about the XML::Simple/XML::Smart family (and also about the SAX stuff IMHO), and try XML::LibXML or XML::Twig.

    Does this help?

      That route seems best for determining the path as you all have suggested. XML::LibXML Ahoy!