Actually I wrote a program to do something similar for a class I took, although I was pulling URL's from pages - actually links to news stories.

Amel has got you going down the right road with his suggestions. I used LWP::Simple to get the web pages, but be warned that you're going to pull all the graphics and everything else with you. Since my program did what I wanted at the time, I didn't look for a way to pull just source, which is what I believe you want to do, but it was suggested to me on here that I look at using a system call and use lynx to get only text.

Depending on the size of the page you're trying to search on, obviously pages that are graphic intensive will take longer to download. Also be aware that you're going to have to possibly deal with frames, and if you're going to take that in to account, then Amel is right, you're probably looking at some kind of recursion or something along those lines.

Hope that points you in a useful direction. :)

Good luck!


In reply to Re: In need of guidance.... by Popcorn Dave
in thread Program that will grep website for specified keyword by psykosmily

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.