in reply to Cannot retrieve HTML for some pages with LWP

Retrieving that URL gives you a 403 Forbidden error, with an error page that points you at http://www.google.com/terms_of_service.html . This is in place because Google bars automated querying of its site. LWP::Simple's get function doesn't have a way for you to see the return codes, so you wouldn't have seen that this causes an error. (If you want such information, use LWP::UserAgent instead.) Instead, the function just returns an empty string, as you saw.
  • Comment on Re: Cannot retrieve HTML for some pages with LWP

Replies are listed 'Best First'.
Re^2: Cannot retrieve HTML for some pages with LWP
by hitheone (Initiate) on May 27, 2005 at 17:20 UTC
    Thanks for your reply. I have the same problem with LWP::UserAgent. I understand the problem. However, how to retrieve web data as a browser, i mean, to realize to action of avoiding automatic access of the web page and fix them.

      Firstly, please be aware of the issues surrounding accessing Google's site in contravention of their terms of service.

      It might be easier for you to use Google's own web APIs, assuming they work for Google Scholar. Look into Net::Google for examples which use ordinary Google search.

      If after all you want to scrape Google Scholar, you may have some luck modifying WWW::Scraper::Google.