in reply to How to access results of XML::Simple?

I started with perl and XML at the same time and I well remember how confusing the syntax of hashes was back then. It gets easier fast. Hopefully the code below will help clarify things.

A few things to note.

You need a root node in your xml. (I used <genes></genes>), this allows you to treat multiple lines as part of a single data structure. The root name will not show up in the Data::Dumper output. You don't use this as part of the structure reference.

You can use a variable/or a hardcoded string in the hash references.

Data::Dumper is very useful for understanding the structure returned by XML::Simple. Don't try it on a very large file of XML because it will churn for ages then run out of memory. Use it on small samples and then comment it out before testing your code on a large file.

use strict; use warnings; use XML::Simple; use Data::Dumper; my $data = do {local $/;<DATA>}; my $xml = XMLin($data); print Dumper($xml), "\n\n"; for my $id (3,4) { print $xml->{gene}{$id}{'label'}, "\n"; } __DATA__ <?xml version="1.0" ?> <genes> <gene id = "3" label = "gene_of_interest" /> <gene id = "4" label = "Another_gene_of_interest" /> </genes>

Output

C:\test>220720 $VAR1 = { 'gene' => { '3' => { 'label' => 'gene_of_interest' }, '4' => { 'label' => 'Another_gene_of_interest' } } }; gene_of_interest Another_gene_of_interest C:\test>

Examine what is said, not who speaks.

Replies are listed 'Best First'.
Re: Re: XML::Simple
by grantm (Parson) on Dec 18, 2002 at 02:12 UTC

    Instead of this:

    my $xml = XMLin($data);

    I'd recommend this:

    my $genes = XMLin($data, keyattr => { gene => 'id' }, forcearray => [ 'gene' ], );

    My reasoning:

    • the return value from XMLin() is a Perl data structure not XML, so $xml is not a very descriptive name
    • you are 'folding' the list of gene elements into a hash keyed on the 'id' - it's more readable/maintainable to spell out your requirements rather than relying on the defaults
    • any element listed in 'keyattr' should usually also be listed in 'forcearray'

    The XML::Simple strict mode node expands on this theme.

    Also, you could give XMLin() the DATA globref directly:

    XMLin(\*DATA, ...);

      Agreed. I actually had both options on in my first version, but then decided that as the process works without, it might only serve to confuse the OP, and removed them.

      Possibly the wrong thing to do in hindsight, but I figured less noise, more clarity?

      Same goes for using GLOB's, I still steer clear of them personally, but it would better in production code.


      Examine what is said, not who speaks.