If you need to extract links from a HTML file, read the documentation of
HTML::LinkExtor - that module should do what you need.
As for checking all URLs from a TXT (i.e. not HTML) file, just read them in (via some sort of loop), and check them in turn. If the new file gives you more URLs to check, simply push them on the end of the array you use to keep the URLs you still have to check.
Be sure to keep a hash of all the URLs already checked, and skip adding them to the "URLs to check" array - that way, you will avoid traversing a loop of links like A->B->C->A.
Hope this helps, and good luck with your homework.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.