in reply to Replacement based on pattern
The (..)|(..) bits in the first part capture into either $1 or $2. As we simply replace $1 with $1 this has the effect of matching tags so you don't substitute text within them. Your regex can effectively be distilled down to: $link_results =~ s,(\Q$term\E),<b>$1</b>,gi;
So all it does is put bold tags around whatever is in $term. It is case insensitive couresy of the /i modifier. The \Q activates quotemata to escape regex specials in $term. The \E is not required in this case but deactivates quotemeta. See perlman:perfunc. Here is a little widget to do the sort of thing you want. Just push all the terms you want to bold into @terms.
# test string my $link_results = '<p>Hello: World Hello hello drewboy Drewboy: Drewb +oy!</p>'; # define an array of the terms we want to bold my @terms = ( 'Hello:', 'Drewboy!' ); # make all the terms regex safe by quotemeta-ing them $_ = quotemeta $_ for @terms; # join all terms with a pipe | so we find any of them - alternation my $bold = join '|', @terms; # make all the subs - case sensitive and global $link_results =~ s#(<[^>]+?>)|($bold)#$1 ? $1 : "<b>$2</b>"#eg; # proof is in da pudding print $link_results;
To avoid bolding where you don't want to we switch off case insensitivity and insist on the punctuation which is apparently present. You could also add the \b or \B boundary modifiers to help ensure that you only match the desired term. I'll leave that as a exercise for you. Using HTML::Parser is a more robust idea to get the text outside of tags for processing.
Corected a technical inexactitude ;-) thanks to scain
Here is an atonement - this is how you do it right using HTML::Parser. We define a hash of tags where the text they contain is OK for substitution. We make our substitution array as before. We then use the power of Parser to selectively make some substitutions - only in the text between the selected tags and absolutely positively not in the tags themselves.
package Filter; use strict; use base 'HTML::Parser'; my ($filter, $sub_OK); my @ok_tags = qw ( h1 h2 h3 h4 p ); my %ok_tags; $ok_tags{$_}++ for @ok_tags; my @terms = ( 'head', 'Parser' ); $_ = quotemeta $_ for @terms; my $bold = join '|', @terms; sub start { my ($self, $tag, $attr, $attrseq, $origtext) = @_; $sub_OK = exists $ok_tags{$tag} ? 1 : 0; $filter .= $origtext; } sub text { my ($self, $text) = @_; $text =~ s#\b($bold)\b#<b>$1</b>#g if $sub_OK; $filter .= $text; } sub comment { # uncomment to not strip comments # my ($self, $comment) = @_; # $filter .= "<!-- $comment -->"; } sub end { my ($self, $tag, $origtext) = @_; $filter .= $origtext; } my $parser = new Filter; my $html = join '', <DATA>; $parser->parse($html); $parser->eof; print $html; print "\n\n------------------------\n\n"; print $filter; __DATA__ <html> <head> <title>Title</title> </head> <body> <h1>Hello Parser</h1> <p>You need HTML::Parser to ger ahead</p> <p>So use your head <h2>Parser rocks my head!</h2> <a href="html.head.parser.com">html.head.parser.com</a> <hr> <pre> use HTML::Parser; head </pre> <!-- HTML PARSER ROCKS MY HEAD! --> </body> </html>
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
|
|---|
| Replies are listed 'Best First'. |
|---|