in reply to XML::Twig question

First your "XML" file is not really XML. That makes it hard to show you some examples of what you can do with XML::Twig. Then it is not clear from your question what you mean by "tag". Is it just the start tag, the entire element or the text of the element? The difference between tag and element is an important one in XML.

Then if all you want is extract information from the file, you might want to have a look at xml_grep, which comes with XML::Twig. Have a look at the docs. If your files are not too big (ie XML::LibXML can load them in memory), you can also use xml_grep2, by the same author (xml_grep2), which has more complete XPath support (once again, at the cost of loading the entire document in memory).

Otherwise, the code below will print the start tags of the AI elements in AC elements with a n attribute of CCC:

XML::Twig->new( twig_handlers => { 'AC[@n="CCC"]//AI' => sub { print $_->start_tag, "\n"; }->parsefile( "my.xml");

If you know that AI elements will allways be direct children of AC elements you can replace the '//' by a single '/'. If your XML file might be big, you could add a $_->purge at the end of the anonymous sub, in order to release some memory, or you could use the twig_roots option instead of twig_handlers.

I hope that gets you started.

Replies are listed 'Best First'.
Re^2: XML::Twig question
by thandi (Novice) on Dec 12, 2006 at 19:15 UTC
    What I need is to satisfy the condition of AC element(i.e. search for n="CCC", or any other value being search for), from there I then need to look in a similar fashion the AI element 'n=???' and/or 'set=???' conditions. This should then allow me to extract the text from the other tags between <ID> and </ID> tags.

    That's what I'm really after plus the text between <Desc>???</Desc> starting from the just below the <AC> tag. The text between the <Desc>???</Desc> is quite important because that can be changed/amended

    Also, is it possible to substitute "CCC" below with a variable e.g my $CCC = "CCC"
    XML::Twig->new( twig_handlers => { 'AC@n=$CCC//AI' => sub { print $_->start_tag, "\n"; }->parsefile( "my.xml");

    ID will always be direct children of AI
    AI will always be direct children of AC
    All of the above will always have a <Desc>???</Desc> below them describing what they're all about.

    <World n="earth" > <Space n="XXX"> <CL n="XXX"> <Desc>CL desc</Desc> <Other/> <AC n="AAA" set="n"> <Desc>AC AAA desc</Desc> <AI n="AAA" set="n"> <Desc>AI AAA desc</Desc> <ID n="AAA" set="y"> <Desc>AAA ID desc</Desc> <What>What AAA ID </What> <AR>ID_aaa</AR> </ID> <ID n="BBB" set="y"> <Desc>BBB ID desc</Desc> <What>What BBB ID </What> <AR>ID_bbb</AR> </ID> </AI> <AI n="BBB" set="y"> <Desc>AI BBB desc</Desc> <ID n="AAA" set="y"> <Desc>AAA ID desc</Desc> <What>What AAA ID </What> <AR>ID_aaa</AR> </ID> </AI> </AC> <AC n="CCC" set="y"> <AI n="AAA" set="n"> <Desc>AI AAA desc</Desc> <ID n="AAA" set="y"> <Desc>AAA ID desc</Desc> <What>What AAA ID </What> <AR>ID_aaa</AR> </ID> </AI> <AI n="AAA" set="n"> <Desc>AI AAA desc</Desc> <ID n="AAA" set="y"> <Desc>AAA ID desc</Desc> <What>What AAA ID </What> <AR>ID_aaa</AR> </ID> </AI> </AC> </CL> </Space> </World>

      Here is what it would look like:

      #!/usr/bin/perl use strict; use warnings; use XML::Twig; my( $ac_n_value, $ai_att_cond)= @ARGV; XML::Twig->new( twig_handlers => { qq{AC[\@n="$ac_n_value"]//AI[$ai_at +t_cond]} => \&print_ai_data }) ->parsefile( "test_thandi.xml"); sub print_ai_data { my( $t, $ai)= @_; print "DESC: ", $ai->first_child( 'Desc')->sprint, "\n", "ID : ", $ai->first_child( 'ID') ->sprint, "\n" ; }

      You call this with the value you want for the n attribute of the AC element, and the condition for the AI element:

      perl test_thandi CCC '@n=AAA' perl test_thandi CCC '@n="AAA" or @set="y"'

      A couple of comments on the code: Perl and XPath strings don't really mix very well: you can use alternate quotes (qq{}) to avoid the collision of perl interpolating quotes and of XPath attribute quotes (or you can use ' instead of " in the XPath expression), but you need to backslash the @ used for attribute conditions in XPath, so it is not interpolated as an array by Perl. An alternate method is to use sprintf to build the XPath expression.

      This code loads the entire document in memory, which you may or may not want. There are techniques to avoid this described in the XML::Twig Tutorial.

      Also, you need the development version of XML::Twig to be able to run this, you can get it from xmltwig.com.

        Below are the queries that I need to formulate based on the xml at the bottom. I'll also have to add a new node e.g starting from the <ID>..</ID> tks before hand thandi
        1. List All ACs in xml document result should be: AC desc AC=CCC, CL=CCC AC desc AC=AAA, CL=CCC # these are the <Desc> element just bel +ow start tag <AC> also need to be able to update the text if requeste +d. 2. List all AIs in a given AC, eg AC=CCC result should be: AI desc AI=BBB, AC=CCC AI desc AI=XXX, AC=CCC 3. List all <ID> nodes in a given <AI n="XXX"> result should be: a. ID desc ID=XXX, AI=BBB, AC=CCC What XXX ID ID_XXX b. ID desc ID=ZZZ, AI=BBB, AC=CCC What ZZZ ID ID_ZZZ The based on the user selection, I need to update the requested text w +ithin an element or add a whole new node. <World n="earth" > <Space n="XXX"> <CL n="XXX"> <Desc>Class description</Desc> <Other/> <AC n="AAA" set="n"> <Desc>AC desc AC=AAA, CL=CCC</Desc> <AI n="AAA" set="n"> <Desc>AI descr AI=AAA, AC=AAA</Desc> <ID n="AAA" set="y"> <Desc>ID desc ID=AAA, AI=AAA, AC=AAA</Desc> <What>What ID=AAA AI=AAA, AC=AAA</What> <AR>ID_aaa</AR> </ID> <ID n="BBB" set="y"> <Desc>ID desc ID=BBB, AI=AAA, AC=AAA</Desc> <What>What ID=BBB AI=AAA, AC=AAA</What> <AR>ID_bbb</AR> </ID> </AI> <AI n="BBB" set="y"> <Desc>AI desc, AI=BBB, AC=AAA</Desc> <ID n="AAA" set="y"> <Desc>ID desc ID=AAA, AI=BBB, AC=AAA</Desc> <What>What AAA ID=BBB, AI=BBB, AC=AAA </What> <AR>ID_bbb</AR> </ID> </AI> </AC> <AC n="CCC" set="y"> <Desc>AC desc ACC=CCC, CL=CCC</Desc> <AI n="AAA" set="n"> <Desc>AI desc AI=BBB, AC=CCC</Desc> <ID n="AAA" set="y"> <Desc>ID desc ID=AAA, AI=AAA, AC=CCCC</Desc> <What>What AAA ID=AAA, AI=AAA, AC=CCC </What> <AR>ID_aaa</AR> </ID> </AI> <AI n="XXX" set="n"> <Desc>AI desc AI=XXX, AC=CCC</Desc> <ID n="XXX" set="y"> <Desc>ID desc ID=XXX, AI=BBB, AC=CCC</Desc> <What>What XXX ID </What> <AR>ID_XXX</AR> </ID> <ID n="ZZZ" set="y"> <Desc>ID desc ID=ZZZ, AI=BBB, AC=CCC</Desc> <What>What ZZZ ID </What> <AR>ID_ZZZ</AR> </ID> </AI> </AC> </CL> </Space> </World>