in reply to LWP::UserAgent Bad and Forbidden requests
Hi taioba,
You have run afoul of the Robots Exclusion Protocol. Many websites prefer that real humans with real eyeballs to visit their site. Some feel strongly enough to ban software "robots" such as LWP::UserAgent. Sciencedirect.com is one of these. If you look at the robots.txt file for sciencedirect.com, you'll see they only let the big boys (Google, et. al.) spider their site. All others (including you) can go suck rocks. There is no (legit) solution to this problem except to call the webmasters and convince them that it is in their interest to allow your program to crawl their site. Good luck with that. Alternatively, see if the site has an RSS data feed or API that provides the data you need. APIs especially are less subject to interdiction by webmasters, since they are designed for program-to-program integration.
Cheers,
Larry
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: LWP::UserAgent Bad and Forbidden requests
by Corion (Patriarch) on Dec 15, 2011 at 19:30 UTC | |
by 1arryb (Acolyte) on Dec 15, 2011 at 19:54 UTC | |
by Corion (Patriarch) on Dec 16, 2011 at 07:26 UTC | |
|
Re^2: LWP::UserAgent Bad and Forbidden requests
by taioba (Acolyte) on Dec 17, 2011 at 16:37 UTC |