Re: Scanning a html document....

And, in addition to plaid's answer, check out HTML::LinkExtor, which sounds like it might fit what you want very well. From the docs:

    HTML::LinkExtor is an HTML parser that extracts links from
    an HTML document.  The HTML::LinkExtor is a subclass of
    HTML::Parser. This means that the document should be given
    to the parser by calling the $p->parse() or $p->parse_file()
    methods.
[download]

Comment on Re: Scanning a html document.... Download Code