A warning: if you're trying to create a semi general purpose solution for HTML or XML files, you should use a proper parser module rather than just a regex. Pattern matching against marked up text is a very fragile approach. For HTML, see HTML::TokeParser::Simple - there's even a bunch of useful examples in the documentation, and chances are you can just lift one of them and modify it to suit your needs.