in reply to Re^3: XML::Twig not finding an element's parent's text
in thread XML::Twig not finding an element's parent's text
Thanks. Working with parent elements containing the target element was the way to go, rather than aiming directly at the target element itself. See the first handler below.
There are a lot of handlers in this script, but here are a pair relevant to my original question:
my $xml = XML::Twig->new( pretty_print => 'nsgmls', # nsgmls for parsability output_encoding => 'UTF-8', twig_roots => { 'office:body' => 1 }, twig_handlers => { # link anchors (text:boomark) must be handled before # processing the internal links '*[text:bookmark]' => \&handler_bookmark, . . . $xml = XML::Twig->new( pretty_print => 'nsgmls', empty_tags => 'html', output_encoding => 'UTF-8', twig_roots => { 'office:body' => 1 }, twig_handlers => { # links (text:a) must be handled separately from link targets 'text:a' => \&handler_links, . . . sub handler_bookmark { my ($twig, $bookmark)= @_; my @bmk = $bookmark->children('text:bookmark'); foreach my $bk (@bmk) { my $l = $bk->trimmed_text; my $t = $l; $t =~ s/\s/_/g; my $anchor = $bk->att('text:name'); $bookmarks{$anchor}{'label'} = $l; $bookmarks{$anchor}{'target'} = $t; $bk->set_text("\n { ".$anchor.' }'); $bk->parent->merge($bk); } } sub handler_links { my ($twig, $link)= @_; my $href = $link->att('xlink:href'); $href =~ s/^\#//; my $l = $bookmarks{$href}{'label'}; my $t = $bookmarks{$href}{'target'}; if (! $l) { $l = $link->trimmed_text; $link->set_text("[$href $l]\n"); } else { $link->set_text("[$t $l]\n"); } $link->parent->merge($link); } . . .
These two handler subroutines are each used in separate parsing pass, for a total of two passes. Strangely, two parsings seems to be faster than one pass with all the handlers in a single object. The first pass collects a hash of link targets and their labels. The second pass applies those to the links pointing at those targets.
|
---|