NB: I wrote a chapter on perl web clients for "Professional Perl Development" by Wrox Press.
You could speed up your spidering using:
- Threading
- fork()
the first is neater, the second is easier but you have to be careful you don't forkbomb your machine.
As far as being careful about which URLs you spider, there's WWW::RobotRules to help you parse and obey robots.txt files.
If you don't fancy buying the book you can download the examples I wrote from here.
--
jodrell.uk.net | [reply] |