The big problem is that the tool you're using (regular expressions) is terribly bad at parsing things like HTML. I'd strongly recommend one of the HTML parsing modules (HTML::Parser or HTML::TreeParser) to do this job. They take care of the messy business of actually understanding the HTML and let you concentrate on stuff like "I want the contents of this tag".