Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Monitor queries for new finds added to the Google index yesterday

by Scott7477 (Chaplain)
on Nov 03, 2006 at 16:56 UTC ( [id://582122]=perlquestion: print w/replies, xml ) Need Help??

Scott7477 has asked for the wisdom of the Perl Monks concerning the following question:

I grabbed the following code from this hack that is excerpted from the book "Google Hacks."
# goonow.pl # Feeds queries specified in a text file to Google, querying # for recent additions to the Google index. The script appends # to CSV files, one per query, creating them if they don't exist. # usage: perl goonow.pl [query_filename] # My Google API developer's key. my $google_key='insert key here'; # Location of the GoogleSearch WSDL file. my $google_wdsl = "GoogleSearch.wsdl"; use strict; use SOAP::Lite; use Time::JulianDay; $ARGV[0] or die "usage: perl goonow.pl [query_filename]\n"; my $julian_date = int local_julian_day(time) - 2; my $google_search = SOAP::Lite->service("file:$google_wdsl"); open QUERIES, $ARGV[0] or die "Couldn't read $ARGV[0]: $!"; while (my $query = <QUERIES>) { chomp $query; warn "Searching Google for $query\n"; $query .= " daterange:$julian_date-$julian_date"; (my $outfile = $query) =~ s/\W/_/g; open (OUT, ">> $outfile.csv") or die "Couldn't open $outfile.csv: $!\n"; my $results = $google_search -> doGoogleSearch( $google_key, $query, 0, 10, "false", "", "false", "", "latin1", "latin1" ); if ($results => "") {die "The soap call failed! \n"} foreach (@{$results->{'resultElements'}}) { print OUT '"' . join('","', ( map { s!\n!!g; # drop spurious newlines s!<.+?>!!g; # drop all HTML tags s!"!""!g; # double escape " marks $_; } @$_{'title','URL','snippet'} ) ) . "\"\n"; } }

I am positive that my API key is correct, and I have the .pl file, the query file, and the WSDL file all in the same directory. When I run this code with search terms "Windows Vista", for example, the code runs as described generating a results file, except that there is nothing in the generated results csv file.

I am stumped as to how to get this to work properly; I am running AS Perl 5.8 on WinXP SP2. Any suggestions as to where I am going wrong here would be greatly appreciated.

Replies are listed 'Best First'.
Re: Monitor queries for new finds added to the Google index yesterday
by kwaping (Priest) on Nov 03, 2006 at 20:21 UTC
    After your line beginning with my $results = $google_search, I recommend adding the following:
    use Data::Dumper::Simple; print Dumper($results);
    That will be a big help in your debugging. (You may need to install Data::Dumper::Simple first.) Also, I recommend you consider using Text::CSV to create your CSV file.

    ---
    It's all fine and dandy until someone has to look at the code.
Re: Monitor queries for new finds added to the Google index yesterday
by duckyd (Hermit) on Nov 03, 2006 at 22:05 UTC
    You aren't checking to see if your call worked. You should be doing something like:
    my $result = $google_search->doGoogleSearch(...); if( $result->fault ){ die "Oops, our soap call failed: ".$som->faultstring; } # No fault, do stuff with your $result
      Good point. When I was working on this code before posting it to PM, I had changed the Soap::Lite call to include its trace functionality as follows:

      use SOAP::Lite +trace;

      In order to get that to work, I had to comment out the "use strict;" line. Doing the above showed me that I had miskeyed my Google API key. I fixed that and then removed the "+trace" from the Soap::Lite call. I've updated the code to include a line to flag failure in the Soap::Lite call along the lines of your suggestion.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://582122]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-26 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found