BenHopkins has asked for the wisdom of the Perl Monks concerning the following question:
I'm using XML::Twig (3.30) and want to replace all entities in the data with ascii equivalents, i.e. — with --.
I tried using a _default_ root handler like this:
And the subroutine fixup looks like this:my $t = XML::Twig->new( twig_roots => {'nitf/body/body.head/hedline/hl1' => \&fix_hl1, 'nitf/body/body.head/hedline/hl2' => \&fix_hl2, _default_ => \&fixup, }, twig_print_outside_roots => 1, keep_encoding => 1, );
The fixup_tag hash has p, byttl, person, hl1, etc., otherwise I was changing higher level tags, like nitf and that removed all the internal tags (not a good thing).sub fixup { my ($tree, $elem) = @_; my $tag = $elem->tag; plog("in fixup: tag = $tag"); if ($fixup_tag{$tag}) { my $t = $elem->text; conv_chars(\$t); $elem->set_text($t); } $elem->print; }
Question: is there some way to apply a filter to just the data and leave the tagging and attributes alone.
I have a feeling that a good answer to the above will obviate what comes next:
Problem: after running this, the output is invalid. The input document has <nitf><head><title></title> followed by a bunch of <meta> tags. The output has (after the DOCTYPE) a <title></title>, the <meta> tags, then the <head>, with <title> and <meta> duplicated, then comes a copy of the whole mess, this time inside the <nitf></nitf> tags!
Obviously, I'm doing something wrong.
Anybody have an idea of what it is?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::Twig question and problem
by mirod (Canon) on Jun 28, 2007 at 05:04 UTC | |
by BenHopkins (Sexton) on Jun 28, 2007 at 06:26 UTC | |
by mirod (Canon) on Jun 28, 2007 at 07:48 UTC | |
by BenHopkins (Sexton) on Jun 28, 2007 at 08:19 UTC |