I have never had the need/requirement/want to deal with any XML before. At least not in any major way. However, I do now, and have a few questions.

Firstly, let me give the basic scenario:

The XML in question could be anything from semi well-formed/created to well-formed/created. Secondly, let's assume that the XML elements from root to a max depth of 4 are known, and we are erring on the side of caution in that the resulting XML::Simple structure may be a large mix of Hash's and Array's at different depths.

Thirdly, some of the element values will vary in size (but the total size of the XML tree itself will usually never exceed 2MB), and there will be multiple sub-element containers of the same name: I am using XMLin() without any major modifiers that will change the resulting structure.

The script I have written below works well enough with the test XML in the XML_RAW heredoc. But, before I start going too far, what suggestions does anyone have? Show me some other methods for getting 'concise' data from an XML tree :-)

use strict; use warnings; use XML::Simple; my $xml_raw = <<XML_RAW; <survey> <animals srcurl="blah.whatever.blah" method="ftp"> <fish name="barramundi" freshwater="yes" saltwater="yes"> <river>Todd</river> <river>Katherine</river> </fish> <fish name="carp" freshwater="yes" saltwater="no"> <river>Tilbuster Ponds</river> <river>Maribyrnong</river> <river>Patterson</river> <river>Paterson</river> <river>Glenelg</river> <river>Murray</river> <river>Bunyip</river> <river>Campaspe</river> </fish> <fish name="yellowfin" freshwater="yes" saltwater="no"> <river>Eucumbene</river> <river>Mulla Mulla Creek</river> <river>Burrungubugge</river> <river>Goobarragandra</river> <river>Bombala</river> <river>Murray</river> <river>Emu Swamp Creek</river> </fish> </animals> </survey> XML_RAW my $xml_hash_ref = XMLin($xml_raw, KeepRoot=>1); my %xml_hash = %{$xml_hash_ref}; my ($tl_hk, $tl_hv) = each %xml_hash; my $last_key = ''; my @key_stash = (); my $ref_type = ''; my $fish_species = ''; my $fish_survey_dump =""; # Just to show you how XML::Simple has structured the XML into a hash #use Data::Dumper; #print Dumper(\%xml_hash); traverse_hash($xml_hash{$tl_hk}, $tl_hk); # Print out the fish survey information that we wanted. # I concatenated it into a scalar just for quick display purposes print "\n\n$fish_survey_dump\n"; sub traverse_hash { my ($hash_val, $last_key) = @_; push(@key_stash, "$last_key ->"); for my $key (keys %{$hash_val}) { $ref_type = ref($hash_val->{$key}) || "VALUE"; print "$ref_type: @key_stash $key -> ", $hash_val->{$key}," +\n"; if($ref_type eq 'HASH') { if($key=~/barramundi|carp|yellowfin/) { $fish_species = $key; concat("\n\n[ Survey information for: $fish_species ]: +\n\n"); concat("Saltwater:" . $hash_val->{$fish_species}{'salt +water'} . "\n"); concat("Freshwater:" . $hash_val->{$fish_species}{'fre +shwater'} . "\n"); concat("Rivers covered in survey:\n\n"); for my $river (@{$hash_val->{$fish_species}->{'river'} +}) { concat("$river\n"); } } $last_key = $key; # Loop through any sub hash's by calling traverse_hash() a +gian. traverse_hash($hash_val->{$key}, $last_key); pop(@key_stash); }elsif($ref_type eq 'ARRAY') { # Array reference traverse_array($key, @{$hash_val->{$key}}); }else{ # Hash value; # ... } } } sub traverse_array { my ($key, @array) = @_; for my $array_val (@array) { print "ARRAY-VAL: @key_stash $key -> ", $array_val,"\n"; if(ref($array_val) eq 'HASH') { traverse_hash($array_val, undef); } } } sub concat { my $string = $_[0]; $fish_survey_dump .= $string; }
The script above gives the following:

HASH: survey -> animals -> HASH(0x1ad678c) VALUE: survey -> animals -> srcurl -> blah.whatever.blah VALUE: survey -> animals -> method -> ftp HASH: survey -> animals -> fish -> HASH(0x1b4262c) HASH: survey -> animals -> fish -> carp -> HASH(0x1b425f0) ARRAY: survey -> animals -> fish -> carp -> river -> ARRAY(0x1b4272 +8) ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Tilbuster +Ponds ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Maribyrnon +g ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Patterson ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Paterson ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Glenelg ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Murray ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Bunyip ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Campaspe VALUE: survey -> animals -> fish -> carp -> saltwater -> no VALUE: survey -> animals -> fish -> carp -> freshwater -> yes HASH: survey -> animals -> fish -> barramundi -> HASH(0x1b425e4) ARRAY: survey -> animals -> fish -> barramundi -> river -> ARRAY(0x +1b4277c) ARRAY-VAL: survey -> animals -> fish -> barramundi -> river -> Todd ARRAY-VAL: survey -> animals -> fish -> barramundi -> river -> Kath +erine VALUE: survey -> animals -> fish -> barramundi -> saltwater -> yes VALUE: survey -> animals -> fish -> barramundi -> freshwater -> yes HASH: survey -> animals -> fish -> yellowfin -> HASH(0x1b425fc) ARRAY: survey -> animals -> fish -> yellowfin -> river -> ARRAY(0x1 +b4268c) ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Eucum +bene ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Mulla + Mulla Creek ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Burru +ngubugge ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Gooba +rragandra ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Bomba +la ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Murra +y ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Emu S +wamp Creek VALUE: survey -> animals -> fish -> yellowfin -> saltwater -> no VALUE: survey -> animals -> fish -> yellowfin -> freshwater -> yes [ Survey information for: carp ]: Saltwater:no Freshwater:yes Rivers covered in survey: Tilbuster Ponds Maribyrnong Patterson Paterson Glenelg Murray Bunyip Campaspe [ Survey information for: barramundi ]: Saltwater:yes Freshwater:yes Rivers covered in survey: Todd Katherine [ Survey information for: yellowfin ]: Saltwater:no Freshwater:yes Rivers covered in survey: Eucumbene Mulla Mulla Creek Burrungubugge Goobarragandra Bombala Murray Emu Swamp Creek

In reply to XML & data structure parsing fun (XML::Simple ??) by kabeldag

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.