bladestonight has asked for the wisdom of the Perl Monks concerning the following question:

I was trying to change some elements in my tree into attributes for other elements. Whilst creating the same subroutines over and over with only the new parent node changing, I thought there has to be a better way.

I created a generic subroutine which deletes the node and inserts it as an attribute for another element but I get the following error:

Can't call method "set_att" without a package or object reference at m +ine.pl line 40, <DATA> chunk 2.

I'm sure I'm missing something fundamental, here is some code to explain what I'm doing (see expected_doc for the result I'm looking for):

#!/usr/bin/perl -w use strict; use XML::Twig; $/="\n\n"; my $doc = <DATA>; # the original data set my $expected_doc = <DATA>; # result with elements changed to attribu +tes my $twig= new XML::Twig( # create the twig pretty_print => 'indented', twig_roots => { 'elt_att' => sub { addAtt(@_,'elt') }, 'selt_att' => sub { addAtt(@_,'subelt') }, }, ); $twig->parse($doc); $twig->flush; # Finished. exit(0); # Give a decent error message if we wrote to stdout and had disk full. END { close(STDOUT) || die "ERROR: can't close stdout: $!\n" } #===================================================================== +========== # Subroutines #--------------------------------------------------------------------- +---------- sub addAtt { my( $t, $att,$parent)= @_; my $e_parent = $t->findnodes($parent); $e_parent->set_att($att->gi,$att->trimmed_text); $att->delete; $t->flush; } __DATA__ <doc> <elt elt_class="class1"> <subelt subelt_class="sclass1"><content id="content1"/></subelt> <elt_att att="elt_att1"></elt_att> </elt> <elt elt_class="class2"> <subelt subelt_class="sclass2"><content id="content2"/></subelt> + <selt_att att="selt_att1"></selt_att> </elt> <elt elt_class="class3"> <subelt subelt_class="sclass3"><content id="content3"/></subelt> <elt_att att="elt_att1"></elt_att> </elt> </doc> <doc> <elt elt_class="class1" elt_att="elt_att1"> <subelt subelt_class="sclass1"> <content id="content1"/> </subelt> <elt elt_class="class2"> <subelt subelt_class="sclass2" selt_att="selt_att1"> <content id="content2"/> </subelt> </elt> <elt elt_class="class3" elt_att="elt_att1"> <subelt subelt_class="sclass3"> <content id="content3"/> </subelt> </elt> </doc>

Replies are listed 'Best First'.
Re: XML::Twig creating generic subroutine for attributes
by mirod (Canon) on Apr 19, 2007 at 09:21 UTC

    There are several problems with your code:

    If you use twig_roots, then anything outside of the roots is ignored, except for the root of the XML. So if you don't set elt as a root, then you won't see it. I think the simplest would be for you to replace twig_roots by twig_handlers, so you can access anything in the XML. If the XML you are processing is too bug to fit in memory, you can set a twig handler on elt that will flush it. It would be the best place to flush in any case, as the logical tree you work on is an elt.

    Then you use findnodes, which returns a list of elements that match the XPath query (and a query like 'elt' will always return an empty list, it is a syntactically valid but useless query, hence the error you get when you call set_att on an empty $e_parent). Instead what you want is either the parent elt, or the previous subelt, which you can get in both cases by using the prev_elt method. So here is a modified setAtt function, that produces the results you want (I also renamed $att as $elt, I was getting confused, and also you don't ant to use the trimmed_text of the element as value of the new attribute, but instead the value of the att attribute of the element you are processing):

    sub addAtt { my( $t, $elt, $parent)= @_; my $e_parent = $elt->prev_elt($parent); if( !$e_parent) { die "no parent '$parent' for element '", $elt->g +i, "'\n"; } $e_parent->set_att($elt->gi,$elt->att( 'att')); $elt->delete; $t->flush; }
Re: XML::Twig creating generic subroutine for attributes
by Jenda (Abbot) on Apr 19, 2007 at 09:34 UTC

    The problem is that the

    my $e_parent = $t->findnodes($parent);
    doesn't return anything. It does look a bit strange, but I don't know XML::Twig enough to be able to suggest the correct syntax. This is how the solution would look with XML::Rules:
    use XML::Rules; my $parser = XML::Rules->new( style => 'filter', ident => ' ', rules => { _default => 'as array trim', elt => sub {delete $_[1]->{_content};$_[0] =>$_[1]}, 'elt_att,selt_att' => sub {$_[0] => $_[1]->{att}}, } ); $parser->filter(\*DATA); __DATA__ <doc> <elt elt_class="class1"> <subelt subelt_class="sclass1"> <content id="content1"/></subelt +> <elt_att att="elt_att1"></elt_att> </elt> <elt elt_class="class2"> <subelt subelt_class="sclass2"><content id="content2"/></subelt> <selt_att att="selt_att1"></selt_att> </elt> <elt elt_class="class3"> <subelt subelt_class="sclass3"><content id="content3"/></subelt> <elt_att att="elt_att1"></elt_att> </elt> </doc>

    The rule for the <elt> is necessary because XML::Rules in the filter mode copies everything it parses as is until it encounters a tag with a custom rule so if there was no special rule for <elt> the module would copy the opening <elt> tag, the <subelt> tag with contents and then would process the <elt_att> and be unable to return back to the <elt> to add the results of the <elt_att> rule to the <elt> tag and would create a <elt_att>elt_att1</elt_att> tag.

      I only learnt about XML::Twig a week ago, I don't really want to learn about another module so soon just for this subroutine duplication problem. I'll keep XML::Rules in mind for the future though.

      Mirod, your solution worked when I changed twig_roots with twig_handlers and replaced my addAtt with your routine. Thanks very much.