toro has asked for the wisdom of the Perl Monks concerning the following question:
I read http://robotstxt.org and therefore know that I should identify my LWP::UserAgent and leave a contact email when I crawl robotically. I'd also like to avoid bombarding the servers I poke, especially during periods of high traffic for them.
Currently I am sleeping between jobs and running cron jobs at boring American hours. Is there a more adaptive way (perhaps within LWP) to pause a crawl when a server says it's busy?
And, anything else I should know to be a polite web crawler?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Waiting my turn politely with LWP
by Anonymous Monk on Jun 17, 2011 at 08:19 UTC | |
by toro (Beadle) on Jun 17, 2011 at 08:22 UTC |