Hey monks! I grovel in the gravel and genuinely genuflect gratefulness.

I'm using XML::Twig (3.30) and want to replace all entities in the data with ascii equivalents, i.e. — with --.

I tried using a _default_ root handler like this:

my $t = XML::Twig->new( twig_roots => {'nitf/body/body.head/hedline/hl1' => \&fix_hl1, 'nitf/body/body.head/hedline/hl2' => \&fix_hl2, _default_ => \&fixup, }, twig_print_outside_roots => 1, keep_encoding => 1, );
And the subroutine fixup looks like this:
sub fixup { my ($tree, $elem) = @_; my $tag = $elem->tag; plog("in fixup: tag = $tag"); if ($fixup_tag{$tag}) { my $t = $elem->text; conv_chars(\$t); $elem->set_text($t); } $elem->print; }
The fixup_tag hash has p, byttl, person, hl1, etc., otherwise I was changing higher level tags, like nitf and that removed all the internal tags (not a good thing).

Question: is there some way to apply a filter to just the data and leave the tagging and attributes alone.

I have a feeling that a good answer to the above will obviate what comes next:

Problem: after running this, the output is invalid. The input document has <nitf><head><title></title> followed by a bunch of <meta> tags. The output has (after the DOCTYPE) a <title></title>, the <meta> tags, then the <head>, with <title> and <meta> duplicated, then comes a copy of the whole mess, this time inside the <nitf></nitf> tags!

Obviously, I'm doing something wrong.

Anybody have an idea of what it is?


In reply to XML::Twig question and problem by BenHopkins

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.