in reply to Pulling all instances of a regex out

Have you tried a module that's intended for parsing HTML, such as HTML::TokeParser (for which I believe there is even a wrapper, HTML::TokeParser::Simple)? It pulls tokens, and I'm pretty sure would get you what you want.

Writing your own regexes to parse HTML has been described a bad idea by some pretty lofty monks (merlyn comes to mind). . .
  • Comment on Re:Pulling all instances of a regex out