Human has asked for the wisdom of the Perl Monks concerning the following question:

I am grooving on XML::Twig, but I can't seem to master the usage of the replace method. In short, I have a series of rules defined in XML, and I want "duplicate" rules to overwrite the first instance of the rule, in a way that maintains the ordering of the rules. XML::Twig's 'replace' method isn't quite working as I thought it would. My test XML file:
<?xml version="1.0" ?> <tcf> <tweak name = "T1"> <description>D1</description> </tweak> <tweak name = "T2"> <description>D2</description> </tweak> <tweak name = "T3"> <description>D3</description> </tweak> <tweak name = "T2"> <description>This should overwrite the old T2.</description> </tweak> </tcf>
My code (trimmed way down to just show the problem):
#!/usr/bin/perl use strict; use XML::Twig; # As each tweak tag is processed, this subroutine is called. Here we +will # see if any previous tweak tag had the same name element. If so, we +will # overwrite the previous tweak tag with this new tweak tag's contents, + then # delete the new tweak tag. sub pruner { my $this_tweak = $_; my $tweakname = $this_tweak->att('name'); my $exp = "/tcf/tweak[\@name=\"$tweakname\"]"; my @matches = $this_tweak->get_xpath($exp); print "TWEAK: "; $this_tweak->print; print " (Found $#matches other parsed tweaks with the same name)\n +"; # If the tweak's name is found elsewhere, replace the first # instance with the latest one, then delete the latest one. if ($#matches == 1) { print "\tReplacing\n\t\t"; @matches[0]->print; print "\n\twith\n\t\t"; $this_tweak->print; print "\n"; $this_tweak->replace(@matches[0]); } } my $twig = XML::Twig->new(expand_external_ents => 1, twig_handlers => { 'tweak' => \&pruner } ); $twig->parsefile("/home/igo/StormLogic/MythiC/Tweaker/test5.tcf"); print "TWIG:\n"; $twig->print; print "\n";
When I run that code on that XML file, here is the output:
$ ./twig_replace_test.pl TWEAK: <tweak name="T1"><description>D1</description></tweak> (Found 0 + other parsed tweaks with the same name) TWEAK: <tweak name="T2"><description>D2</description></tweak> (Found 0 + other parsed tweaks with the same name) TWEAK: <tweak name="T3"><description>D3</description></tweak> (Found 0 + other parsed tweaks with the same name) TWEAK: <tweak name="T2"><description>This should overwrite the old T2. +</description></tweak> (Found 1 other parsed tweaks with the same nam +e) Replacing <tweak name="T2"><description>D2</description></tweak> with <tweak name="T2"><description>This should overwrite the old T2 +.</description></tweak> TWIG:
It should print the Twig at the end, but it gets stuck. Am I abusing the replace method and/or putting the Twig into a broken state? This is one particular instance of the problems I've had trying this sort of thing. I wanted to understand what I'm doing wrong with this tiny example first so I can tackle the larger case on my own. Thanks!

Replies are listed 'Best First'.
Re: XML::Twig replace method behaving counter-intuitively
by Jenda (Abbot) on Dec 01, 2007 at 02:54 UTC

    I know it's not really an answer to your question, but what about:

    use XML::Rules; my $parser = XML::Rules->new( style => "filter", rules => [ _default => 'raw', tweak => sub { my ($tag, $attr, undef, undef, $parser) = @_; if (exists $parser->{pad}{$attr->{name}}) { %{$parser->{pad}{$attr->{name}}} = %$attr; return; } else { $parser->{pad}{$attr->{name}} = $attr; return [$tag => $attr]; } }, tcf => sub {delete $parser->{pad}; return $_[0] => $_[1]}, ] ); $parser->filter(\*DATA); __DATA__ <?xml version="1.0" ?> <tcf> <tweak name = "T1"> <description>D1</description> </tweak> <tweak name = "T2"> <description>D2</description> </tweak> <tweak name = "T3"> <description>D3</description> </tweak> <tweak name = "T2"> <description>This should overwrite the old T2.</description> </tweak> </tcf>
    What the code does is it builds a datastructure with the contents of the <tcf> tag (the outmost tag that has a subroutine rule) storing most tags "literaly" (whatever that means) and doing something special for the <tweak> tags. For each such tag (after it's fully parsed and the inner tags are processed according to the rules) it checks whether there is a backreference in the $parser->{pad} hash to a previous instance of <tweak> with the same name. If there is it replaces the contents and attributes of that tag and returns nothing. If there is no such backreference it creates one and returns an arrayref containing the tagname and the hash with attributes and content (this causes this to be added into the _content of the parent tag and later be written into the resulting XML.

    I hope the explanation makes some sense :-) The filter mode of XML::Rules and the way the built datastructures are serialized to XML is a bit hard to explain.

      Thanks for replying, Jenda! I haven't tried XML::Rules before, but the technical reason why I switched to XML::Twig from XML::Simple was that XML::Twig would preserve the order of entries. (I later learned it'd also do cool stuff like DTD processing and a few other things I don't need right away.) Since XML::Simple was hash-based, there's no guarantee that ordering is preserved. From a glance at the XML::Rules page on CPAN, it looks like it may use hashes, too. Is that the case? If so, then I may be unable to use such a solution. It's also possible that I'm missing some way to solve the ordering problem.

        If you try that code you will see that it does preserve the ordering. While XML::Rules does use hashes most of the time, it' really up to you what data do you need to preserve from what tag and how. Using that code the <tweak> tags' data end up in the array referenced by $_[1]->{_content} within the rule specified for the <tcf> tag. How are the data from a tag available within the $attr hashref of the parent tag depends on the tag's rule.

Re: XML::Twig replace method behaving counter-intuitively
by mirod (Canon) on Dec 04, 2007 at 00:41 UTC

    Sorry for the late answer, I was sick.

    The easy fix is that you need to cut $this_tweak before you can safely call replace. I will add a cut if the element is still part of a twig in the next version, thanks.

    A couple more things: if you use warnings you will see that @matches[0] is "best written as" $matches[0] (note the $). Also the handler (the pruner sub) receives 2 parameters, the twig and the element, so you could write my( $twig, $this_tweak)= @_ (you might already know that, but I'd rather newcomers read it here ;--). And finally, using get_xpath to get the previous tweak seems a bit wasteful. You could use simply $this_tweak->previous_sibling( qq{tweak[\@name="$tweak_name"]}), or even keep an index of tweak_name => tweak_element, for direct access.

    Does that help?

      Hi, mirod! No apology for being sick is necessary. Thanks very much for replying.

      The cut did the trick! :) FYI, the thing that threw me was the difference in the wording of the documentation for the paste and replace methods, as shown on CPAN. The paste method makes it clear that you can only paste previously-cut elements (of course), but the replace method says "Sometimes it is just not possible tocut[sic] an element then paste another in its place, so replace comes in handy." I interpreted this to mean that replace was analogous to cut and paste, when you wanted to paste over another element. If I may recommend a simple change to the line that precedes it, to give it context: "Replaces an element in the tree." -> "Replaces an element in the tree with a previously cut element." (or whatever the general case is)

      Thanks for the array syntax pointer - it's a bad habit of mine, and I'll debug with -w from now on :)

      I completely missed the information that the current twig is passed into the subroutine, but I'm glad I know now.

      I did run into a problem executing the new code that looks for the previous matching element: $this_tweak->previous_sibling( qq{tweak[\@name="$tweak_name"]})

      $ ./twig_replace_test.pl Can't locate object method "previous_sibling" via package "XML::Twig:: +Elt" at ./twig_replace_test.pl line 14.
      Do I need to define that method myself?

        The next version of XML::Twig will actually cut the element if hasn't been cut before. That's easier than changing the docs ;--)

        And I am still not at 100%, I meant prev_sibling, not previous, my bad.

      (posted the above anonymously by mistake)