I'm going to be radical, and suggest you look at WWW::Robot. This module is intended to go through an entire site pulling down the data, and allowing you to do what you wish with it.

I'd also suggest splitting the program in two parts. The first part pulls down all the data and stores it on local storage, the second greps the local copy. This way you don't have to wait whilst the data is fetched again for a second time if you decide to grep for something else or if you have a bug in the code, and the website maintainer doesn't begin to hate you for taking up silly amounts of bandwidth by getting the site multiple times.

You can use File::Find to simplify the second part too.

Also make sure you obey the robot exclusion rules, and have a delay between getting consecutive URLs so you don't give the server a good kicking.


In reply to Re: In need of guidance.... by Molt
in thread Program that will grep website for specified keyword by psykosmily

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.