in reply to Re: automated search using perl
in thread automated search using perl

The trick you mention using the url "http://www.google.com/search?q=perl" (and various extensions) works just fine from Firefox or IE browsers.

HOWEVER, this fails when I call the same url using perl as in:

my $ua = new LWP::UserAgent; my $req = HTTP::Request->new(GET => $url); my $res = $ua->request($req);
Instead google returns a long error message of which I have exercepted the following key snippet:

Your client does not have permission to get URL xxxx from this server. (Client IP address: xxx.xxx.xxx.xxx). Please see Google's Terms of Service posted at http://www.google.com/terms_of_service.html

I suppose that Google recognizes that I am not using a standard browser and assumes (perhaps correctly) that I must be violating their TOS. Is there any way to get around this? i.e. can I make my perl code look like a valid browser? I would prefer a simple method that doesn't require in-depth knowledge of (or access to) Google APIs or other specialized perl modules.

Replies are listed 'Best First'.
Re^3: automated search using perl
by ashish.kvarma (Monk) on Jul 21, 2009 at 05:43 UTC
    Google (or any server) can detect the client from 'user-agent' header. Default value for LWP::UserAgent is 'libwww-perl/#.##'(refer LWP::UserAgent) It can be changed by setting agent attribute to any valid user-agent string. Here is one example:
    my $ua = new LWP::UserAgent; $ua->agent('Mozilla/5.0'); #$ua->agent('Checkbot/0.4 '); #$ua->agent('Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1'); #$ua->agent('FireFox 2'); my $req = HTTP::Request->new(GET => $url); my $res = $ua->request($req); if ($res->is_success) { print $res->content; # or whatever } else { die $res->status_line; }
    Regards,
    Ashish