Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I am having brain meltdown here. I want to search a string and replace occurances of a certain word with a tagged value around it, but only if that word doesn't appear in an html tag. For example:

$string = "<img src=\"test.jpg\"> highlight the word 'word' in this se +ntence";
would become

<img src="test.jpg"> highlight the <b>WORD</b> '<b>WORD</b>' in this s +entence
but:

$string "<img src=\"test.jpg\"> highlight the word <img src=\"word.jpg +\">";
would become
<img src="test.jpg"> highlight the <b>WORD</b> <img src="word.jpg">;
Note that the word word in the photo name did not get <b> tags around it.

I know it can be done, but I'm having trouble with the proper regex for it. My knowledge is limited.
$replacementString = "<b>WORD</b>"; $string =~ s/word/$replacementString/ig;
which obviously is replacing all occurances, inside tags or not. What am I missing?

Replies are listed 'Best First'.
Re: Replacing text NOT in an HTML tag
by Joost (Canon) on Aug 19, 2005 at 18:32 UTC
Re: Replacing text NOT in an HTML tag
by wfsp (Abbot) on Aug 19, 2005 at 18:42 UTC
    Joost beat me to it!

    Here's one way

    #!/bin/perl5 use strict; use warnings; use HTML::TokeParser; my $str = q|<img src=\"test.jpg\"> highlight the word <img src=\"word. +jpg\">|; my $tp = HTML::TokeParser->new(\$str) or die "Couldn't parse $str: $!"; $tp->unbroken_text(1); my ($html, $start); while (my $tag = $tp->get_token) { if ($tag->[0] eq 'T'){ my $t = $tag->[1]; $t =~ s|word|<b>WORD</b>|g; $html .= $t; } else{ $html .= $tag->[4] if $tag->[0] eq 'S'; $html .= $tag->[1] if $tag->[0] eq 'C'; $html .= $tag->[2] if $tag->[0] eq 'E'; } } print "$html\n"; __DATA__ ---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" parse_str.pl <img src=\"test.jpg\"> highlight the <b>WORD</b> <img src=\"word.jpg\" +> > Terminated with exit code 0.