in reply to Scanning a html document....

And, in addition to plaid's answer, check out HTML::LinkExtor, which sounds like it might fit what you want very well. From the docs:
HTML::LinkExtor is an HTML parser that extracts links from an HTML document. The HTML::LinkExtor is a subclass of HTML::Parser. This means that the document should be given to the parser by calling the $p->parse() or $p->parse_file() methods.