Wise Monks, I've to turn to you for a piece of advise on the following problem.

I'm using XML::Twig to parse an XML file. The output should simply be the path of each element in the DOM tree. I've written a handler that is associated to all start tags and that does precisely that. Since I don't want the leading '/', I strip it using substr. No problem so far. However, I also want to have the XML tags in lowercase and now things start to get interesting.

I've included two Perl programs, one that parses an actual XML file, the other simulating the behavior of the handler on ordinary text data to try and isolate the problem. The output of the latter seems fine, while the output of the former is clearly incorrect.

#!/usr/bin/perl use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new( twig_handlers => {'_all_' => \&start_tag} ); $twig->parse(*DATA); sub start_tag { my ($t, $e) = @_; my $str = $e->path(); print substr(lc($str), 1), "\n"; print lc(substr($str, 1)), "\n\n"; } __DATA__ <A> <a> <B>blah blah</B> <b>blah blah blah</b> <b>blah <a/> blah</b> </a> <b/> </A>
The output produced is:
a/a/b a/a/b a/a/b a/a/b a/a/b a/a/b/a a/a/b a/a/b a/a a/a a/b a/b a a
Note the third group which doesn't yield the expected output. Below is the attempt to reproduce this outside the context of XML parsing:
#!/usr/bin/perl use strict; use warnings; while (<DATA>) { chomp($_); print_str($_); } sub print_str { my ($str) = @_; print substr(lc($str), 1), "\n"; print lc(substr($str, 1)), "\n\n"; } __DATA__ /A/a/B /A/a/b /A/a/b/a /A/a/b /A/a /A/b /A
which produces the expected results below:
a/a/b a/a/b a/a/b a/a/b a/a/b/a a/a/b/a a/a/b a/a/b a/a a/a a/b a/b a a

It would seem that within the XML handler something very weird happens, as if a variable with a fixed length (that which it has in the first invocation) is reused between calls to the handler.

I'd be grateful if someone could shed some light on this. Thanks in advance, -gjb-

Update: given that this seems to be a version specific issue, I should mention the results above have been obtained using XML::Twig 3.23 (i.e. the latest version) on Perl 5.8.7 built for cygwin-thread-multi-64int (i.e. the standard version that can be installed using Cygwin's installer).


In reply to XML::Twig handlers weirdness? by gjb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.