Using only modules that come prepackaged with Perl (I'd love it even more if it were just LWP::Simple and CGI), I need to load a domain name and float through their pages for statistic information.

How would you go about finding all the links on one page, finding all the links on the next page, etc, and continuously branch out until all links have been exhausted? I'm thinking this probably has to be done via a hash to ensure whether or not that particular link was already scanned.

My other question is, when things of this nature are done are the processes or data collected DURING this initial search? Or do we typically record all the possible links first, then use LWP::Simple and load each of our pages to do whatever we need to them?

Example code would be better than just posting a link to Some::Mod on Cpan. If this can't be done easily without other modules that may work too, but I'd rather not use anything Perl didn't come with.


In reply to Crawling all urls on a site by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.