bw has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm using XML::Simple to parse XML that is similar to the following in order to look at various attributes within the <attributes> elements given below. The problem is that Data::Dumper converts the name="varyingName" values into elements.

I can't figure out how to get to the elements that are located further down the XML code tree given this name-to-element conversion behavior.
<dataschemas> <dataschema name="varyingName1" <attributes> <attribute> ... some attributes ... </attribute> </attributes> <dataschema name="varyingName2" <attributes> <attribute> ... some attributes ... </attribute> </attributes> </dataschema> </dataschemas>
Dumper produces output similar to the following snippet:
{ 'dataschemas' => { 'dataschema' => { 'varyingName1' => { 'attributes' => { 'attribute' => { ... attributes here ... } } }, 'varyingName2' => { 'attributes' => { 'attribute' => { ... attributes here ... } } } } } }
Again, I don't know how to walk down the element (tree?) to get into the "attributes here" area. Since the 'varyingName' (elements?) change, I can't identify them directly. Please help! I've tried the following with no success:

To get to the first 'varyingName' stanza --

foreach my $d (@{$data->{dataschemas}->{dataschema}->[0]->{attributes}->{attribute}}) { ... do something here ... }

Now, something like the following does produce output:

print $data->{dataschemas}->{dataschema}->{varyingName1}->{attributes}->{attribute}->{name};

I went through the XML::Simple doc, searched this site, and Googled for other places to find the answer no luck so far. I'll greatly appreciate whatever assistance, insight, and wisdom you can provide to enable me to solve this problem.

Thank you, BW

Replies are listed 'Best First'.
Re: XML::Simple--dealing with a variable element
by GrandFather (Saint) on Jul 26, 2006 at 23:55 UTC

    You may find XML::TreeBuilder more suited to your task:

    use warnings; use strict; use XML::TreeBuilder; my $str = <<'STR'; <dataschemas> <dataschema name="varyingName1"> <attributes> <attribute> attr 1 attr 2 </attribute> </attributes> </dataschema> <dataschema name="varyingName2"> <attributes> <attribute> attr 1 attr 3 </attribute> </attributes> </dataschema> </dataschemas> STR my $xml = XML::TreeBuilder->new; $xml->parse($str); my @attributes = $xml->find('attribute'); for (@attributes) { # Figure out where the element is my @lineage = reverse $_->lineage (); my @eNames = map {$_->tag() . ($_->attr('name') ? '(' . $_->attr('name') . +')' : '')} @lineage; print join ('/', @eNames), ":\n"; # Access the element text my $oldText = $_->as_text(); print "$oldText\n"; # Alter the content $_->push_content ("Extra attribute added"); } print $xml->as_HTML ('<>&', ' '); # Looks bogus, but gets indentation

    Prints:

    dataschemas/dataschema(varyingName1)/attributes: attr 1 attr 2 dataschemas/dataschema(varyingName2)/attributes: attr 1 attr 3 <dataschemas> <dataschema name="varyingName1"> <attributes> <attribute> attr 1 attr 2 Extra attribute added</attribute> </attributes> </dataschema> <dataschema name="varyingName2"> <attributes> <attribute> attr 1 attr 3 Extra attribute added</attribute> </attributes> </dataschema> </dataschemas>

    Update: added editing sample code


    DWIM is Perl's answer to Gödel
Re: XML::Simple--dealing with a variable element
by rhesa (Vicar) on Jul 26, 2006 at 22:47 UTC
    Chew on this:
    @attrs = map { @{ $_->{attributes}{attribute} } } values %{ $data->{dataschemas}{dataschema} };

    Update: actually, it doesn't look like your attribute nodes result in an array. Do you need the attribute names too, or just the values?

    Maybe this will be clearer:

    foreach my $dataschema ( values %{ $data->{dataschemas}{dataschema} } +) { while( my ($name, $value) = each %{ $dataschema->{attributes}{attr +ibute} } ) { # do something with the name and value } }
      I'll crack the book to decipher your code. I simply need the names.

      My end goal for all of this is to figure out existing attribute names being used, give a user the opportunity to select from that existing list or add new ones, write out a new <dataschema> stanza, and then update the config file where the <dataschemas> stanza came from.
        Are you sure the names are unique for all the varying groups? If so, then
        @attrs = map { keys %{ $_->{attributes}{attribute} } } values %{ $data->{dataschemas}{dataschema} };
        should do the trick. I think. I usually get dizzy with data structures this deep ;)
      rhesa: the foreach option is almost working!

      The first element within <attribute> is:
        <attributes>
          <attribute category="" parser="CSVParser" />

      which produces the error:
      Argument "" isn't numeric in each at xmlparser.pl line 58.
        Bad index while coercing array into hash at xmlparser.pl line 58.

      when using the code:

      foreach my $dataschema ( values %{ $data->{dataschemas}{dataschema} } +) { while( my ($name, $value) = each %{ $dataschema->{attributes}{attrib +ute} } ){ # do something with the name and value print "\$name is: $name\n\$value is: $value\n"; } }
      When I used keys in place of each, I get the print line to work, but I'm stuck in an infinite loop in the process.

      This shows that I don't understand very well hashes and hash functions with respect to the while( my ($name, $value) = each %{ $dataschema->{attributes}{attribute} } ) line since this error doesn't make sense to me.

      I know that I'm getting hung up on the category="" item, but that's as far as I've gone in my debug process.
        Your newer data shows that you usually have an array of "attribute" nodes, while I was assuming it would be a hash. I'm forcing XML::Simple to always make an array of the "attribute" nodes, and have changed the code to accomodate that.
        use XML::Simple; $data = XMLin( \*DATA, KeepRoot => 1, ForceArray => [ qw/attribute/ ] + ); @attrs = map { $_->{category} } # We want the c +ategory value map { @{ $_->{attributes}{attribute} } } # for each attr +ibute values %{ $data->{dataschemas}{dataschema} }; # in all datasc +hemas. print "@attrs\n"; __DATA__ <dataschemas> <dataschema name="varyingName1"> <attributes> <attribute category="one" fair="no" /> </attributes> </dataschema> <dataschema name="varyingName2"> <attributes> <attribute category="" parser="CSVParser" /> <attribute category="APS" internalCategory="aps" /> <attribute category="ASC" internalCategory="asc" /> <attribute category="ASMT" internalCategory="asmt" /> <attribute category="LE" internalCategory="l&amp;e" /> <attribute category="NCO" internalCategory="nco" /> <attribute category="NEED TITLE" internalCategory="need title" + /> <attribute category="New Need ID" internalCategory="need id" / +> <attribute category="SUMMARY OF DELIVERABLES" parser="TextPars +er" extract="true" segmentation="hard" /> </attributes> </dataschema> </dataschemas>
        A version based on the foreach / while idea:
        foreach my $dataschema ( values %{ $data->{dataschemas}{dataschema} } +) { foreach my $attr ( @{ $dataschema->{attributes}{attribute} } ){ print "category is: ", $attr->{category}, "\n"; } }
Re: XML::Simple--dealing with a variable element
by grantm (Parson) on Jul 27, 2006 at 01:05 UTC

    You may find the data structure easier to deal with if you turn off array folding by specifying KeyAttr => {} in your call to XMLin(). Then, at the top level, the dataschema key will point to an array of elements in the order they occurred in the source document rather than a hash keyed on the value of the name attribute.

    I'm not sure why you've included the ->[0] in your code. There are no arrays in your dumped output.

Re: XML::Simple--dealing with a variable element
by planetscape (Chancellor) on Jul 27, 2006 at 06:46 UTC