XML::Twig question

thandi has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: XML::Twig question
by Tanktalus (Canon) on Dec 12, 2006 at 14:48 UTC

Um, how far have you gotten? I presume you have some code to show - even if it's just the part that points at the XML file and uses XML::Twig. You probably should have the constructor for XML::Twig, but maybe it's wrong.

I would suggest starting with that. Then reading the XML::Twig documentation. It points to the XML::Twig website, including a quick reference card and a FAQ. These should get you going. Then, if you run into trouble with that, come on back with your code, and we can see how best to help you.

Good luck!

(PS - I also recommend checking out the fabulous help on PM for constructing your nodes...)

[reply]

Re: XML::Twig question
by zentara (Cardinal) on Dec 12, 2006 at 15:55 UTC

PerlSax

Just put your xml file as input to the following script, and watch the output. Then you can setup your handlers to filter out whatever you want. The advantage of this approach, is you can feed it huge files and it will process nodes as it finds them.

#!/usr/bin/perl
use warnings;
use strict;
use XML::Parser::PerlSAX;

my $parser = new XML::Parser::PerlSAX( Handler => new SampleHandler );

$parser->parse( Source => { SystemId => shift } );


package SampleHandler;

sub new {
   my $self = {};
   return bless( $self );
}

sub start_document { print "start_document\n"; }
sub end_document   { print "end_document\n"; }

sub start_element {
   my ( $self, $element ) = @_;
   my $name = $element->{ Name };
   print "start_element: '$name'\n";
   while ( my ( $k, $v ) = 
          each( %{ $element->{ Attributes } } ) )  {
          print "  attribute: $k = $v\n";   
  }
}

####  a sample sub for parsing a specific node ###########
#sub start_element {
#   my ( $self, $element ) = @_;
#   my $name = $element->{ Name };
#
## print "start_element: '$name'\n";
#   if ( $name eq 'node' ) {
#      my %node = %{ $element->{ Attributes } };
#      print $node{ 'id' }, ' ', $node{ 'lat' }, ' ', $node{ 'lon' };
#   }
#}
########################################



sub end_element {
   my ( $self, $element ) = @_;
   my $name = $element->{ Name };
   print "end_element: '$name'\n";
}

sub characters {
   my ( $self, $text ) = @_;
   my $data = $text->{ Data };
   print "characters: '$data'\n";
}
[download]

I'm not really a human, but I play one on earth. Cogito ergo sum a bum

[reply]
[d/l]

Re: XML::Twig question
by mirod (Canon) on Dec 12, 2006 at 16:28 UTC

First your "XML" file is not really XML. That makes it hard to show you some examples of what you can do with XML::Twig. Then it is not clear from your question what you mean by "tag". Is it just the start tag, the entire element or the text of the element? The difference between tag and element is an important one in XML.

Then if all you want is extract information from the file, you might want to have a look at xml_grep, which comes with XML::Twig. Have a look at the docs. If your files are not too big (ie XML::LibXML can load them in memory), you can also use xml_grep2, by the same author (xml_grep2), which has more complete XPath support (once again, at the cost of loading the entire document in memory).

Otherwise, the code below will print the start tags of the AI elements in AC elements with a n attribute of CCC:

XML::Twig->new( twig_handlers => { 'AC[@n="CCC"]//AI' => sub { print $_->start_tag, "\n"; }->parsefile( "my.xml");

If you know that AI elements will allways be direct children of AC elements you can replace the '//' by a single '/'. If your XML file might be big, you could add a $_->purge at the end of the anonymous sub, in order to release some memory, or you could use the twig_roots option instead of twig_handlers.

I hope that gets you started.

[reply]
[d/l]
[select]

Re^2: XML::Twig question

by thandi (Novice) on Dec 12, 2006 at 19:15 UTC

That's what I'm really after plus the text between <Desc>???</Desc> starting from the just below the <AC> tag. The text between the <Desc>???</Desc> is quite important because that can be changed/amended

Also, is it possible to substitute "CCC" below with a variable e.g my $CCC = "CCC"
XML::Twig->new( twig_handlers => { 'AC@n=$CCC//AI' => sub { print $_->start_tag, "\n"; }->parsefile( "my.xml");

ID will always be direct children of AI
AI will always be direct children of AC
All of the above will always have a <Desc>???</Desc> below them describing what they're all about.

<World n="earth" >
  <Space n="XXX">
    <CL n="XXX">
      <Desc>CL desc</Desc>
      <Other/>
      <AC n="AAA" set="n">
        <Desc>AC AAA desc</Desc>
      
        <AI n="AAA" set="n">
          <Desc>AI AAA desc</Desc>
          <ID n="AAA" set="y">
            <Desc>AAA ID desc</Desc>
            <What>What AAA ID </What>
            <AR>ID_aaa</AR>
          </ID>
            
          <ID n="BBB" set="y">
            <Desc>BBB ID desc</Desc>
            <What>What BBB ID </What>
            <AR>ID_bbb</AR>
          </ID>
        </AI>
      
        <AI n="BBB" set="y">
          <Desc>AI BBB desc</Desc>
          <ID n="AAA" set="y">
            <Desc>AAA ID desc</Desc>
            <What>What AAA ID </What>
            <AR>ID_aaa</AR>
          </ID>
        </AI>
      
      </AC>

      <AC n="CCC" set="y">

        <AI n="AAA" set="n">
          <Desc>AI AAA desc</Desc>
          <ID n="AAA" set="y">
            <Desc>AAA ID desc</Desc>
            <What>What AAA ID </What>
            <AR>ID_aaa</AR>
          </ID>
        </AI>
      
        <AI n="AAA" set="n">
          <Desc>AI AAA desc</Desc>
          <ID n="AAA" set="y">
            <Desc>AAA ID desc</Desc>
            <What>What AAA ID </What>
            <AR>ID_aaa</AR>
          </ID>
        </AI>

      </AC>
    </CL>
  </Space>
</World>
[download]

[reply]
[d/l]

Re^3: XML::Twig question

by mirod (Canon) on Dec 12, 2006 at 19:52 UTC

Here is what it would look like:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;


my( $ac_n_value, $ai_att_cond)= @ARGV;

XML::Twig->new( twig_handlers => { qq{AC[\@n="$ac_n_value"]//AI[$ai_at
+t_cond]} => \&print_ai_data })
         ->parsefile( "test_thandi.xml");

sub print_ai_data
  { my( $t, $ai)= @_;
    print "DESC: ", $ai->first_child( 'Desc')->sprint, "\n",
          "ID  : ", $ai->first_child( 'ID')  ->sprint, "\n"
          ;
  }
[download]

You call this with the value you want for the n attribute of the AC element, and the condition for the AI element:

perl test_thandi CCC '@n=AAA'
perl test_thandi CCC '@n="AAA" or @set="y"'
[download]

A couple of comments on the code: Perl and XPath strings don't really mix very well: you can use alternate quotes (qq{}) to avoid the collision of perl interpolating quotes and of XPath attribute quotes (or you can use ' instead of " in the XPath expression), but you need to backslash the @ used for attribute conditions in XPath, so it is not interpolated as an array by Perl. An alternate method is to use sprintf to build the XPath expression.

This code loads the entire document in memory, which you may or may not want. There are techniques to avoid this described in the XML::Twig Tutorial.

Also, you need the development version of XML::Twig to be able to run this, you can get it from xmltwig.com.

[reply]
[d/l]
[select]

Re^4: XML::Twig question

by Bizza (Initiate) on Dec 19, 2006 at 21:50 UTC

Re^5: XML::Twig question

by mirod (Canon) on Dec 19, 2006 at 22:28 UTC

Some notes below your chosen depth have not been shown here

Re: XML::Twig question
by Jenda (Abbot) on Dec 29, 2006 at 15:46 UTC

Apart from XML::Twig you could also use XML::Rules like this:

use XML::Rules;

my $AC_n = 'CCC';

my $parser = XML::Rules->new(
  rules => [
    _default => 'as is',
     # by default keep both attributes and _content
    'Desc,What,AR' => 'content',
     # for those keep just the content
    '^AC' => sub {return ($_[1]->{n} eq $AC_n)},
     # only process the <AC> tags whose n attribute equals the $AC_n
    AC => '',
     # once processed, forget the contents of the <AC> tag
    AI => sub {print "Found AI: n=$_[1]->{n}, desc=$_[1]->{Desc}\n"; r
+eturn},
     # for each processed AI tag print the n attribute and Desc subtag
+.
     # thanks to the 2nd rule you don't have to write
     # $_[1]->{Desc}{_contents}
     # As this rule returns nothing, the contents of the tag are not r
+emembered.
  ],
);

$parser->parse($XML);
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]