It depends on the structure of the HTML file, but how about using a module like
HTML::Parse or
HTML::TokeParse to chop up the data file and return the addresses to you? You're more likely to go mad trying to write a regex to handle all of the possibilities.