I am not 100% sure what you are trying to achieve, however you may want to check out the WWW::Mechanize and HTML::TokeParser modules. They may suite your requirements of getting html pages and extrating information.
Yes, I think that regexp's are for the simplest digs from html or xml only. Of course, you can write very sophisticated regexp, but this way is, imho, read only and more painfull.
So I suggest some html parser, especially, if you _really_ cannot get better data sources than html.