in reply to Moving a tag within text with XML::Twig
(I also had some trouble getting my head around this -- realizing that the tree created by XML::Twig includes both objects that represent whole markup units and objects that hold the text content ((P)CDATA), if any, that precede and/or follow each whole markup unit. <update> For that matter, I still don't quite grok "get_xpath" -- but it looks like I got it working for this case.</update>)
I hope the example will be clear/easy enough for you to adapt to your particular needs.
#!/usr/bin/perl use strict; use XML::Twig; my @xmltst = ("<doc>Doc 1<a>Some text.<b>footnote text</b></a></doc>", "<doc>Doc 2<a>(More text.)<b>next footnote</b> QED</a></ +doc>", "<doc>Doc 3<a>\"More text?\"<b>3rd footnote</b></a><a> Q +ED</a></doc>", "<doc>Doc 4<a>('More text!')<b>4th note</b> QED.</a><a>F +inis.</a></doc>", ); for my $doc ( @xmltst ) { my $twig = XML::Twig->new(); $twig->parse( $doc ); my @b_elts = $twig->get_xpath('//b'); print "\ndoc contains ",scalar @b_elts," footnotes:\n"; $twig->print; for my $b ( @b_elts ) { my $prev = $b->prev_sibling; my $prev_text = $prev->text; if ( $prev_text =~ /([.,?!\)\"\']+)$/ ) { my $punct = $1; my $offset = length( $prev_text ) - length( $punct ); if ( $b->next_sibling ) { my $next = $b->next_sibling; $prev->set_text( substr( $prev_text, 0, $offset )); $next->set_text( $punct . $next->text ); } else { my $next = $prev->split_at( $offset ); $next->cut; print "\n created new elt containing: ".$next->text; $next->paste( 'after', $b ); } } } print "\n AFTER EDITING:\n"; $twig->print; $twig->dispose; print "\n"; } __OUTPUT__ doc contains 1 footnotes: <doc>Doc 1<a>Some text.<b>footnote text</b></a></doc> created new elt containing: . AFTER EDITING: <doc>Doc 1<a>Some text<b>footnote text</b>.</a></doc> doc contains 1 footnotes: <doc>Doc 2<a>(More text.)<b>next footnote</b> QED</a></doc> AFTER EDITING: <doc>Doc 2<a>(More text<b>next footnote</b>.) QED</a></doc> doc contains 1 footnotes: <doc>Doc 3<a>"More text?"<b>3rd footnote</b></a><a> QED</a></doc> created new elt containing: ?" AFTER EDITING: <doc>Doc 3<a>"More text<b>3rd footnote</b>?"</a><a> QED</a></doc> doc contains 1 footnotes: <doc>Doc 4<a>('More text!')<b>4th note</b> QED.</a><a>Finis.</a></doc> AFTER EDITING: <doc>Doc 4<a>('More text<b>4th note</b>!') QED.</a><a>Finis.</a></doc>
(update: I realized, after posting this code, that it's bad form to use "$b" as a lexically-scoped scalar like this -- no harm done in this example, since I'm not using "sort", so I'll leave it as-is. But I should have known better.)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Moving a tag within text with XML::Twig
by mirod (Canon) on Sep 26, 2005 at 08:46 UTC | |
|
Re^2: Moving a tag within text with XML::Twig
by skillet-thief (Friar) on Sep 26, 2005 at 10:07 UTC |