Extracting links

cajun has asked for the wisdom of the Perl Monks concerning the following question:

I'm getting started learning to use HTML::LinkExtor which seems to be appropriate for the little tast I've dreamed up to accomplish.

What I want to do is search a given web page, extract certain links from that web page. The example in HTML::LinkExtor can easily be modified to do this.

Where I'm stuck though is from each one of those certain links extracted from the first page, I then want to search those links and extract certain links from those pages. Maybe it would be more clearly explained to say I want to run the example script, extract a set of given links, then run the script again on each of those extracted links.

I've done a bit of searching (including merlyn's home page) and haven't found anything quite similar to this as yet. Perhaps I've over looked it. Pointers to similar code or gentle nudges in the right direction would be greatly appreciated.

Comment on Extracting links

Replies are listed 'Best First'.
Re: Extracting links by clemburg (Curate) on Jun 20, 2001 at 13:44 UTC
Read Web Techniques Column 7 (October 1996) from merlyn. It presents a recursive link checker. Christian Lemburg Brainbench MVP for Perl http://www.brainbench.com	[reply]
Re: Extracting links by E-Bitch (Pilgrim) on Jun 20, 2001 at 18:53 UTC
I know this probably isnt the best solution, but what you can do is use File::Find on each of the resulting links. The problem with this is, you can only check links that are on your current server. And, of course, having a page that has 75 links, means 75 finds, each one of those, could say produce 30 more links, for a total of (well, you get the picture, for the total of a lot of wasted processor time.) On Second thought, read the article mentioned in the above post... Sorry for droning thanks! E-Bitch	[reply]