You need to look at the '~literal' pseudo-element of HTML::Element.
If I understand your requirements correctly, the following code should do what you want.
#!/usr/bin/perl use warnings; use strict; my $html = <<'EOHTML'; <html> <body> <p>The relationship can be expressed by the following equation:<br> <blockquote> y <= (7x<sup>3</sup> + 3x<sup>2</sup>)/((x - 3)(x - 5) </blockquote> <p>Of course x != 3 & x != 5 -- That goes without saying. </body> </html> EOHTML use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_content($html); # Look for double-dashes in the text, # and change them to —es. $tree->look_down(sub { my $element = shift; my @content = $element->content_list(); @content = map { if(ref) { $_ # Skip non-text children } else { # Break up the text child into pieces on the --'s my @texts = split /(--)/, $_; # Replaces those --'s with a ~literal # pseudo-element containing an — @texts = map { $_ eq '--' ? HTML::Element->new('~literal', text => '—') : $_ } @texts; @texts; } } @content; # Replace the old content with the modified content $element->splice_content(0, scalar $element->content_list, @content) +; return 0; }); print $tree->as_HTML;
Update: The (formatted) output of the above is this:
<html> <head> </head> <body> <p>The relationship can be expressed by the following equation:<br> <blockquote> y <= (7x<sup>3</sup> + 3x<sup>2</sup>)/((x - 3)(x - 5) </blockquote> <p>Of course x != 3 & x != 5 — That goes without saying. </body> </html>
bbfu
Black flowers blossum
Fearless on my breath
In reply to Re: processing text nodes using HTML::Element
by bbfu
in thread processing text nodes using HTML::Element
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |