Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys and gals

So I am a little new to XML::Parser but I have read the docs and PODs and know that this module has what I need but I am a little stuck on the implementation.
I have this XML:

<?xml version='1.0' encoding='UTF-8'?> <list name="name list"> <person> <firstname>Paul</firstname> <lastname>Rutter</lastname> <age>24</age> </person> <person> <firstname>Ruth</firstname> <lastname>Brewster</lastname> <age>22</age> </person> <person> <firstname>Cas</firstname> <lastname>Creer</lastname> <age>23</age> </person> </list>
and I have this perl script:
#!/usr/bin/perl use XML::Parser; use Data::Dumper; use strict; use warnings; $parser = new XML::Parser( Style => 'Tree' ); my $tree = $parser->parsefile( 'testxml.xml' ); print Dumper( $tree );
which prints the contents of the xml to the screen. However what I really want to do is Parse through the file taking out each firstname and putting them in an array. At the very least I would quite like to just know how to 'get at' to a node.

The $tree - is it an array? a scalar? a variable? little confused there you see. If it is a scalar, how do I get access to it?

Thanks in advance

Replies are listed 'Best First'.
Re: Trees in XML
by GrandFather (Saint) on Jun 03, 2008 at 10:44 UTC

    You may find XML::TreeBuilder or XML::Twig are more appropriate for your application. Consider:

    use strict; use warnings; use XML::TreeBuilder; my $xml = <<XML; <?xml version='1.0' encoding='UTF-8'?> <list name="name list"> <person> <firstname>Paul</firstname> <lastname>Rutter</lastname> <age>24</age> </person> <person> <firstname>Ruth</firstname> <lastname>Brewster</lastname> <age>22</age> </person> <person> <firstname>Cas</firstname> <lastname>Creer</lastname> <age>23</age> </person> </list> XML my $root = XML::TreeBuilder->new (); $root->parse ($xml); my @firstNames = map {$_->as_text ()} $root->look_down (_tag => 'first +name'); print "TreeBuilder: @firstNames\n"; use XML::Twig; my $twig = XML::Twig->new (twig_roots => { 'person/firstname' => \&p +ushName}); @firstNames = (); $twig->parse ($xml); print "Twig: @firstNames\n"; sub pushName { my ($t, $elt) = @_; push @firstNames, $elt->text (); }

    Prints:

    TreeBuilder: Paul Ruth Cas Twig: Paul Ruth Cas

    Perl is environmentally friendly - it saves trees
      I am aware of other modules out there but the literature I have read suggests that XML::Parser is best suited to me. I am keen to stick to using this module.

      I also do not manage the computer I work on and so installing extra modules is not simple and very time consuming. I would rather stick to XML::Parser. Any help on how to access the data out of the array?
      Would it be like a normal array? In which case I could use a while loop and $data[0] references?

      any help would be super

        TreeBuilder and Twig are pure-perl modules, which use XML::Parser for their work.
Re: Trees in XML
by rovf (Priest) on Jun 03, 2008 at 11:09 UTC
    I would do it with XML::Simple; something like (code not tested!!):
    use XML::Simple qw(:strict); ... my $h=XMLin('your-xml-code-or-file-name-goes-here', forcearray => [ qw(person) ], keyattr => []);
    After this, $h->person[2]->{lastname} would return 'Creer'.
    -- 
    Ronald Fischer <ynnor@mm.st>
      Thanks
      I have tried simple and found that it mangles the order of the XML when I put it into a hash. I am also experimenting with LibXML but as I said, I would really like to use XML::Parser tree style.

      You see what I would like to do is put the returned value (firstnames) into an array which I can then use later, in another part of the perl script.

      So any clues on how to use XML::Parser to obtain th firstnames?

      Many thanks

        It is not XML::Simple which is mangling the order; it is the very property of a hash, that you don't have order there. That's the reason why I used forcearray in my example: This makes the person elements go into array instead of a hash, and staying in the same order as in the XML file.

        BTW, does the order matter in your example?

        -- 
        Ronald Fischer <ynnor@mm.st>
Re: Trees in XML
by toolic (Bishop) on Jun 03, 2008 at 16:38 UTC
    The $tree - is it an array? a scalar? a variable?
    ref can answer that for you:
    print ref($tree), "\n";

    outputs:

    ARRAY

    Or, if you read the free manual (XML::Parser), it says:

    For elements, the content is an array reference.

    Or, looking at the Dumper output:

    $VAR1 = [ 'list', [ { 'name' => 'name list' }, 0, ' ',

    the square bracket signifies an array ref.

    how do I get access to it?

    perlreftut is a good place to begin figuring out how to access Perl data structures such as this.

Re: Trees in XML
by Jenda (Abbot) on Jun 03, 2008 at 13:51 UTC
    use strict; use XML::Rules; my $rules = XML::Rules->new( stripspaces => 7, rules => { _default => 'content', person => sub { # push the string we build to the array referenced by the +{person} # key in the paren tag's hash return '@person' => "$_[1]->{firstname} $_[1]->{lastname} +($_[1]->{age})" }, list => sub { # only interested in the person "attribute" # due to the previous rule it's an arary ref return $_[1]->{person}; # and this is what the $rules->parse() will return } } ); my $people = $rules->parse(\*DATA); use Data::Dumper; print Dumper($people); __DATA__ <?xml version='1.0' encoding='UTF-8'?> <list name="name list"> <person> <firstname>Paul</firstname> <lastname>Rutter</lastname> <age>24</age> </person> <person> <firstname>Ruth</firstname> <lastname>Brewster</lastname> <age>22</age> </person> <person> <firstname>Cas</firstname> <lastname>Creer</lastname> <age>23</age> </person> </list>
      thanks Jenda but unfotunately XML::Rules is not installed on my computer and seeing as it is managed I can't do it.

      In the mean time I have come up with this:

      #!/usr/bin/perl use strict; use warnings; use XML::Parser; use Data::Dumper; my $p = new XML::Parser( Style => 'Tree' ); my $inputfile = "testxml.xml"; my $tree = $p->parsefile($inputfile); print $tree->[1]->[4]->[4]->[2], "\n";
      which gives me 'Niall'

      As I said, need to stick with XML::Parser!

        As I said, need to stick with XML::Parser!

        Not exactly. You seem to have convinced yourself that 1- you need to only use XML::Parser, 2- the Tree style is the simplest way to get what you want. It seems to me that 1 is false lazyness, and 2 is just misguided.

        Learning how to use pure-perl modules, even on a machine where you don't have admin rights, would make it easier for you to write not only this piece of code, but also the next ones.

        Your problem seems really adapted to a stream processing, whether it's using XML::Parser or an other module. Your code would be much more resistant to changes in the XML structure in the future: in your example [1]->[4]->[4]->[2] is effectively the hardcoded (and some would say obfuscated) path to your target element. If you don't want to hardcode it, you will end up re-writing code that's already written in the likes of XML::Twig, XML::XPath, XML::Rules... Meanwhile with a stream processing you would just process the firstname element, and leave the rest as is, thus you would be able to apply your code even if the input XML changes, as long as it still includes a firstname element.

        That said, it's your code, you do what you want, just realize that you will get more help if you follow the general advice of using a better tool for the task.

        If you can upload your script you can upload XML::Rules as well. It's pure Perl, a single file and the only dependencies are strict, warnings, Carp and XML::Parser::Expat. The first three are core, I do believe if you have XML::Parser you have the last one.

        You can upload the Rules.pm into /some/path/you/have/access/to/lib/XML, add

        use lib '/some/path/you/have/access/to/lib';
        on top of your script and you are effectively done with the instalation.

        sorry, I mean:
        Paul