in reply to Re^2: Use WWW::Mechanize to Download Pictures of Sayuri Anzu
in thread Use WWW::Mechanize to Download Pictures of Sayuri Anzu
There are a few issues involved here. The first, of course, is determining the Terms of Service or "Fair Use" of the site in question. Do they disallow screen scraping? Do they have a robots.txt file that disallows your program accessing the files in question? If so, respecting that is important etiquette. For example, you could check out the robots.txt file in the root directory of the White House Web site.
Assuming there are no ethical objections to writing your program, it might be a good idea to contact the Webmaster of the site you are scraping and ask them what an appropriate delay is. As tilly pointed out, if someone is serving CGIs off an old computer at home, even your two second delay could be problematic.
Cheers,
Ovid
New address of my CGI Course.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Ethical issues with screen scraping
by TheEnigma (Pilgrim) on Aug 19, 2004 at 18:39 UTC | |
by Ovid (Cardinal) on Aug 19, 2004 at 20:02 UTC |