Rather than extracting this data from HTML via a regex you may want to use the HTML::TokeParser module to return the data you wish from the HTML. If you do a Super Search on this topic there are a lot of posts to back the use of this module rather than using regex.