There are many ways of doing so. Regexes aren't the answer, because you can't have variable width look-back assertions with Perl.
I think the easiest way to do this is to use
HTML::Parser with only
text tokens. Have your text handler do the substitutions.
Another way would be stripping htmltags and storing them in an array or something. But matching HTML is harder than it seems, so I'd go for HTML::Parser
2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$