in reply to Is this the best way to use HTML::TreeBuilder to bold text in an HTML document?

I don't know about effective, or safe, but why create a big old tree when all you are doing is simple filtering? Your memory overhead must be great since you are working within a HTML::Mason framework, why add to the burden? I would use HTML::Parser or HTML::TokeParser to approach this problem.

I'll update this node with some code in about 5 min (i'm not on my computer)

#!/usr/bin/perl -w use strict; #use warnings; use HTML::TokeParser; undef $/; print processHTML(<DATA>); sub processHTML { my $tp = HTML::TokeParser->new(\$_[0]); my $return; while (my $token = $tp->get_token) { my $ttype = shift @{ $token }; if($ttype eq "S") # start tag? { $return .= $token->[3]; } elsif($ttype eq "T") # text? { $token->[0] =~ s/(perl)/\<B\>$1\<\/B\>/ig; $return .= $token->[0]; } elsif($ttype =~ /(?:C|D)/) # comment?declaration { $return .= $token->[0]; } elsif($ttype =~ /(?:E|PI)/) # end tag?process instrunction { $return .= $token->[1]; } } # endof while (my $token = $p->get_token) undef $tp; return $return; } __END__ <html> <head> <title>This title contains Perl but does not get changed.</title> </head> <body> <p>This is some text containing the term 'perl'.</p> <ol> <li>Unix</li> <li>Perl</li> <li>Linux</li> </ol> <p>Notice how the term perl in the following link doesn't change, but +the text does. <a href="http://www.perlmonks.org">Perlmonks.org</a></p> </body> </html>
update:
after visiting this thread again, and looking a little closer at the html after __DATA__ I saw <title>This title contains Perl but does not get changed.</title> Well I kind of ignored that portion ;), but it's easy to include a sentinel in the above loop.

Aww what the heck, here goes, one way to do it with HTML::(Toke)Parser

#!/usr/bin/perl -w #boldemhtml.pl use strict; use warnings; use HTML::Parser; use HTML::TokeParser; my ${Where_does_data_end} = tell DATA; undef $/; print processHTML(<DATA>); seek DATA, ${Where_does_data_end}, 0; print 'x' x 30, " HERE GO a little faster version \n"; print processHTML2(<DATA>); exit; sub processHTML { my $tp = HTML::TokeParser->new(\$_[0]); my $return; my $SENTINEL=1; while (my $token = $tp->get_token) { my $ttype = shift @{ $token }; if($ttype eq "S") # start tag? { $return .= $token->[3]; } elsif($ttype eq "T") # text? { $token->[0] =~ s/(perl)/\<B\>$1\<\/B\>/ig unless $SENTINEL; $return .= $token->[0]; } elsif($ttype =~ /(?:C|D)/) # comment?declaration { $return .= $token->[0]; } elsif($ttype =~ /(?:E|PI)/) # end tag?process instrunction { $SENTINEL = 0 if $token->[0] eq 'title'; $return .= $token->[1]; } } # endof while (my $token = $p->get_token) undef $tp; return $return; } sub processHTML2 { my $SENTINEL = 1; my $p = HTML::Parser->new( api_version => 3); my $return; $p->handler(default => sub { $return .= $_[0]; $SENTINEL = 0 if $_[1] eq 'end' and $_[ +2] eq '/title'; return undef; } ,'text,event,tag'); =head1 the default handler could also be rewritten as $p->handler(default => sub { $return .= $_[0]; $SENTINEL = 0 if $_[0] =~ m{</title>} +i; return undef; } ,'text'); this version would only have a default handler =cut $p->handler(text => sub { $_[0] =~ s!(perl)!<B>$1</B>!ig unless $SENTINEL; $return .= $_[0]; return undef; } ,'text'); $p->parse($_[0]); undef $p; return $return; } __END__ <html> <head> <title>This title contains Perl but does not get changed.</title> </head> <body> <p>This is some text containing the term 'perl'.</p> <ol> <li>Unix</li> <li>Perl</li> <li>Linux</li> </ol> <p>Notice how the term perl in the following link doesn't change, but +the text does. <a href="http://www.perlmonks.org">Perlmonks.org</a></p> </body> </html>

 
______crazyinsomniac_____________________________
Of all the things I've lost, I miss my mind the most.
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

  • Comment on (crazyinsomniac) Re: Is this the best way to use HTML::TreeBuilder to bold text in an HTML document?
  • Select or Download Code