Unlike that monster on the White House web site, this one is rather small:
User-agent: * Disallow: /cgi-bin/ Disallow: /journals/EJDE/Monographs/ Disallow: /journals/EJDE/Volumes/
From what I read yesterday about robots.txt files, I'm OK, since I'm scraping the results of a search page that resides in a different directory.
But your advice about asking the webmaster about an appropriate delay is well taken, I'll see if I can contact him. I'm sure this is a quite capable server, since it's a service of the European Mathematical Society. Plus there are several mirrors.
But in general though, are you saying that even if I'm accessing high bandwidth servers, I should be using at least a two second delay?
TheEnigma
In reply to Re^4: Ethical issues with screen scraping
by TheEnigma
in thread Use WWW::Mechanize to Download Pictures of Sayuri Anzu
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |