As suggested by Abigail-II in Re: Web Robot, a polite robot should:
I was further interested to learn in Chip Salzenberg's letter at geeksunite that: "Federal courts have upheld that web spiders must obey the established robots.txt mechanism by which web site owners limit automated access and that a failure to obey robots.txt constitutes trespass".
However, I'm confused about who robots.txt is intended for. I understand robots.txt applies to heavy duty web spiders and indexers, such as a Google robot. But does it also apply to little screen scraping tools written by private individuals? For example, suppose I write a little tool using LWP::UserAgent or WWW::Mechanize (rather than LWP::RobotUA or WWW::Mechanize::Polite ?, say) that simply collects a number of web pages for me while I sleep. Is it illegal or unethical for such a scraper to ignore robots.txt?
If a commercial company sells a tool that allows non-programmer end users to write little screen scraping robots, is it unethical or illegal for such a product to not provide a mechanism to allow their end users to respect robots.txt?
In reply to [OT] Ethical and Legal Screen Scraping by eyepopslikeamosquito
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |