If you need to extract links from a HTML file, read the documentation of HTML::LinkExtor - that module should do what you need.
As for checking all URLs from a TXT (i.e. not HTML) file, just read them in (via some sort of loop), and check them in turn. If the new file gives you more URLs to check, simply push them on the end of the array you use to keep the URLs you still have to check.
Be sure to keep a hash of all the URLs already checked, and skip adding them to the "URLs to check" array - that way, you will avoid traversing a loop of links like A->B->C->A.
Hope this helps, and good luck with your homework. | [reply] |