Following Extracted Links

cdherold has asked for the wisdom of the Perl Monks concerning the following question:

fellow monks,

I'm trying to parse the links in a webpage and then write a routine that will go to the destination of each link and scan for a set of key words. If the key word(s) are in that link page the contents of the page will be copied.

I'm using LWP:: Simple to retrieve the web page under $webpage. I'm planning on looking into the benefits of using Link::Extor to get the links in the page sent to a scalar and then I'll set up a subroutine to go through the scalar, pull out the contents of each link (using LWP::Simple again), search the link for keyword matches, and then determine if that page needs to be copied.

If anyone has any hints on a better way to do this or any advice whatsoever, before I get too invested in this, it would be greatly appreciated.

thanks,

perl neophyte cdherold

Comment on Following Extracted Links

Replies are listed 'Best First'.
Re: Following Extracted Links by merlyn (Sage) on Apr 29, 2001 at 19:34 UTC
If it's only one level like that, then yes, use LWP::Simple to fetch the top-level, use HTML::LinkExtor, or even HTML::SimpleLinkExtor to grab the links, then one more application of LWP::Simple's `GET` per link. If you're interested in a recursive fetch, first, check into an insane asylum. When you get out, take a look at WWW::Robot, or my columns (see my home node) for samples of that. -- Randal L. Schwartz, Perl hacker	[reply]

Replies are listed 'Best First'.

Re: Following Extracted Links
by merlyn (Sage) on Apr 29, 2001 at 19:34 UTC

LWP::Simple

HTML::LinkExtor

HTML::SimpleLinkExtor

LWP::Simple

GET

If you're interested in a recursive fetch, first, check into an insane asylum. When you get out, take a look at WWW::Robot, or my columns (see my home node) for samples of that.

-- Randal L. Schwartz, Perl hacker

[reply]