abhipesit has asked for the wisdom of the Perl Monks concerning the following question:

hello...perl monks... i need a script that enters the keyword in google search page and then fetches the link in the next page...
I am able to fetch the links in the current page but the problem is how can i enter the keyword to be searched in the google home page....
I m using the following code....
#!/usr/bin/perl # Include the WWW::Mechanize module use WWW::Mechanize; $url = "http://www.google.com"; # Create a new instance of WWW::Mechanize my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); # Mention the keyword to be searched my $keyword='perl';
now how do i enter the keywords into the google page..... i tried with $mechanize->forms but its not working.... someone pls help me out... Any kind of help will be truly appreciated...

Replies are listed 'Best First'.
Re: automated search using perl
by CountZero (Bishop) on Jul 13, 2009 at 20:03 UTC
    Don't do it. You will get yourself into trouble. Use the Google API.

    See http://code.google.com/apis/Google API and the module WWW::Google::API on CPAN.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      your first link is broken.

      I've taken a look at WWW::Google::API and I looked at the code , after a quick inspection I can say that it just provides authentication(I haven't tested) , I don't think it does anything other than that.

      I hope you can contradict me about this because I'd like to use the module if it provides what the OP asked for.

        This link works: Google Base API.

        I think the module can do more than just authentication (there exist several sub-classed modules that use it), but it is indeed rather "bare". If you need a "higher level" type of interface, have a look at DBD::Google.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: automated search using perl
by Corion (Patriarch) on Jul 13, 2009 at 19:57 UTC

    How is WWW::Mechanize failing for you? Alternatively to ->forms() try ->dump_forms(). Also see Google's Terms Of Service, which likely prohibit using WWW::Mechanize or anything like it to scrape its results.

Re: automated search using perl
by ashish.kvarma (Monk) on Jul 14, 2009 at 09:04 UTC
      The trick you mention using the url "http://www.google.com/search?q=perl" (and various extensions) works just fine from Firefox or IE browsers.

      HOWEVER, this fails when I call the same url using perl as in:

      my $ua = new LWP::UserAgent; my $req = HTTP::Request->new(GET => $url); my $res = $ua->request($req);
      Instead google returns a long error message of which I have exercepted the following key snippet:

      Your client does not have permission to get URL xxxx from this server. (Client IP address: xxx.xxx.xxx.xxx). Please see Google's Terms of Service posted at http://www.google.com/terms_of_service.html

      I suppose that Google recognizes that I am not using a standard browser and assumes (perhaps correctly) that I must be violating their TOS. Is there any way to get around this? i.e. can I make my perl code look like a valid browser? I would prefer a simple method that doesn't require in-depth knowledge of (or access to) Google APIs or other specialized perl modules.

        Google (or any server) can detect the client from 'user-agent' header. Default value for LWP::UserAgent is 'libwww-perl/#.##'(refer LWP::UserAgent) It can be changed by setting agent attribute to any valid user-agent string. Here is one example:
        my $ua = new LWP::UserAgent; $ua->agent('Mozilla/5.0'); #$ua->agent('Checkbot/0.4 '); #$ua->agent('Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1'); #$ua->agent('FireFox 2'); my $req = HTTP::Request->new(GET => $url); my $res = $ua->request($req); if ($res->is_success) { print $res->content; # or whatever } else { die $res->status_line; }
        Regards,
        Ashish
      In case you are looking for the "Advance search" just do a random search and have a look at the URL in the browser. I believe it will be self explanatory.
      Regards, Ashish