Mur has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for a spider tool that can be dropped into our existing web application. The ideal one would take a URL base path, and invoke a Perl routine every time it found a link or page from the same site. It needs to be fully aware of cookies, session IDs, and all the major authorization models.
--
Jeff Boes
Database Engineer
Nexcerpt, Inc.
vox 269.226.9550 ext 24
fax 269.349.9076
 http://www.nexcerpt.com
...Nexcerpt...Connecting People With Expertise

Replies are listed 'Best First'.
Re: Need spidering solution
by pzbagel (Chaplain) on Jul 31, 2003 at 17:52 UTC

    Since you are asking this on perlmonks, I assume you actually want to write the spider yourself, right? Check out O'Reilly's Perl & LWP. It discusses how to use the LWP bundle of modules to write web clients including spiders. It also discusses some of the pitfalls of spiders and how to avoid them. An invaluable resource if you hope to automatically download and parse HTML.

    HTH

Re: Need spidering solution
by perrin (Chancellor) on Jul 31, 2003 at 19:36 UTC
Re: Need spidering solution
by blue_cowdawg (Monsignor) on Jul 31, 2003 at 17:38 UTC

    Take a look at LWP::UserAgent and possibly HTML::TokeParse and friends.


    Peter @ Berghold . Net

    Sieze the cow! Bite the day!

    Test the code? We don't need to test no stinkin' code! All code posted here is as is where is unless otherwise stated.

    Brewer of Belgian style Ales