in reply to Re: Twig Mixed Content Child Text Replace Issues
in thread Twig Mixed Content Child Text Replace Issues

Thanks for replies.

So I'm attempting to alter the string text of <paragraph> without affecting the context of the <bold> tags. Right now, the '->text' command retrieves the text of all sub-elements. Hence, when I set the <paragraph> text, the code is replacing the <bold> elements with a string. I lose the <bold> tag.

I need to maintain the integrity of the <bold> tags while making string substitutions within the <paragraph> element. I will not always know how the text and text children of the <paragraph> element will look. So the content will vary.

I need to be able to find all of the text children of a mixed content <paragraph> element and substitute within the text string without disturbing other child elements.

Example:
Input: <paragraph> Some <bold>text</bold> here which may be any <bold>length< +/bold> and <bold>contain</bold> a number of child tags.</paragraph> Expected output: <paragraph> Some <bold>text</bold> here which may be any <bold>length< +/bold> and <bold>contain</bold> a quantity of child tags.</paragraph> Code: use XML::Twig; my $file = '<paragraph> Some <bold>text</bold> here which may be any < +bold>length</bold> and <bold>contain</bold> a number of child tags.</ +paragraph>'; my $twig = new XML::Twig(TwigHandlers => {'paragraph' => \&paragraph} +,TwigRoots => {paragraph => 1}); $twig->parse($file); $twig->print; sub paragraph { my ($twig, $para,) = @_; my $para_text = $para->text; &choiceReplace($para,$para_text); } sub choiceReplace { my ($para,$para_text) = @_; my $search = "number"; #setting search and replace for example my $replace = "quantity"; #locate each occurrence of search term and prompt user to replace foreach ($para_text =~ /$search/) { my $new_version = $para_text; my $offset = 0; my $new_offset = 0; my $result = index($para_text, $search); my $new_result = index($new_version, $search); $offset = $result; $new_offset = $new_result; my $l = length ($search); #loop through string search results while found while (($result != -1) && ($new_result != -1)) { #create visuals for user to accept/deny match print "\n\nCurrent Version:\n $para_text"; my $replace_match = "**[[$replace]]**"; substr($new_version,$new_offset,$l) = $replace_match; print "\n\nMatched Version:\n $new_version"; print "\nWould you like to make this change (y or n)? "; chomp($change=<STDIN>); if ($change eq "n"){ my $nm_result = rindex($new_version, $replace_match); my $nm_l = length ($replace_match); my $no_match = "[DENIED]"; substr($new_version,$nm_result,$nm_l) = $no_match; } elsif ($change eq "y") { my $nm_result = rindex($new_version, $replace_match); my $nm_l = length ($replace_match); my $no_match = "[CHANGED]"; substr($new_version,$nm_result,$nm_l) = $no_match; substr($para_text,$offset,$l) = $replace; } #update search starting point $result = index($para_text, $search, $offset + 1); $offset = $result; $new_result = index($new_version, $search, $new_offset); $new_offset = $new_result; } #set text for <paragraph> @_[0]->set_text($para_text); } }

Replies are listed 'Best First'.
Re^3: Twig Mixed Content Child Text Replace Issues
by mirod (Canon) on Feb 08, 2011 at 16:18 UTC

    The fact that you want the user input for each substitution makes the problem a _lot_ more tricky. Otherwise you could simply use subs_text as in my previous answer.

    In this case you should do the substitution on the text of the '#TEXT' (or '#PCDATA') children of the paragraph. But then what happens if the text contains twice the string you want to replace, and you want only to replace the second one? The logic becomes quite a bit more complex. Of the top of my head I would modify the regexp, to let it skip the appropriate number of occurrences of the string to replace.

    A non interactive version that you could use as a basis would be:

    #!/usr/bin/perl use strict; use warnings; use Test::More tests => 1; use XML::Twig; my $doc = '<paragraph> Some <bold>text</bold> here which may be any <b +old>length</bold> and <bold>contain</bold> a number of child tags.</p +aragraph>'; my $exp = '<paragraph> Some <bold>text</bold> here which may be any <b +old>length</bold> and <bold>contain</bold> a quantity of child tags.< +/paragraph>'; # I got a little fancy here to allow several keywords to replace # the keywords are grouped in a regexp, sorted by inverse length so th +e alternation works properly my $replace = { number => 'quantity' }; my $keywords= join( '|', map { "\Q$_\E" } sort { length$b <=> length $ +a } keys %$replace); my $t=XML::Twig->new( twig_roots => { paragraph => \&subs_word })->par +se( $doc); is( $t->sprint, $exp, 'one change') ; exit; sub subs_word { my( $t, $para)= @_; foreach my $text_elt ($para->children( '#TEXT')) { my $text= $text_elt->text; if( $text_elt->text=~ m{\b($keywords)\b}) { $text=~ s{\b($keywords)\b}{$replace->{$1}}g; $text_elt->set_text( $text); } }
      Thanks mirod, your advice is spot on.

      I have modified my code (using much less-elegant methods) to prompt the user for each #text match. It's working beautifully, but will take me a bit more work to get to a regex solution as clean as yours.

      Twig is turning out to be a very useful tool.