Don't try to reinvent the wheel, use one of the CPAN modules for HTML parsing. You could easily use HTML::TokeParser::Simple. I think maybe someone has already done what you're trying to do, check out HTML::LinkAdd too.
Comment on Re: regex to match content not inside an HTML anchor or other tags