in reply to Re^3: Ethical issues with screen scraping
in thread Use WWW::Mechanize to Download Pictures of Sayuri Anzu
Unlike that monster on the White House web site, this one is rather small:
User-agent: * Disallow: /cgi-bin/ Disallow: /journals/EJDE/Monographs/ Disallow: /journals/EJDE/Volumes/
From what I read yesterday about robots.txt files, I'm OK, since I'm scraping the results of a search page that resides in a different directory.
But your advice about asking the webmaster about an appropriate delay is well taken, I'll see if I can contact him. I'm sure this is a quite capable server, since it's a service of the European Mathematical Society. Plus there are several mirrors.
But in general though, are you saying that even if I'm accessing high bandwidth servers, I should be using at least a two second delay?
TheEnigma
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Ethical issues with screen scraping
by Ovid (Cardinal) on Aug 19, 2004 at 20:02 UTC |