Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re: Benign Web Minerby Arunbear (Prior) |
on Sep 30, 2006 at 19:04 UTC ( [id://575693]=note: print w/replies, xml ) | Need Help?? |
If you don't mind a non-Perl solution, you might consider Heritrix. It is capable of large scale crawling, is kind to hosts that it visits (if a host takes n seconds to respond, it will wait m*n seconds before hitting that host again; m is configurable but defaults to 5) and has extremely flexible crawl settings.
In Section
Seekers of Perl Wisdom
|
|