There are many ways of doing so. Regexes aren't the answer, because you can't have variable width look-back assertions with Perl.
tokens. Have your text handler do the substitutions.
Another way would be stripping htmltags and storing them in an array or something. But matching HTML is harder than it seems, so I'd go for HTML::Parser