dime has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!

I am trying to get the title, summary, original url and thumbnail url of a set number of google images via a cgi. For yahoo, it works just fine, but google is giving me headaches. I suspect it has something to do with proxy settings (for yahoo, I use LWP::UserAgent, for Google I don't know how to do that). I saw the nifty Scrape Google's Image Search program, but this time, I don't actually want to download the pics, and I need more context, so decided against using it. I went with this code:

my $json = new JSON; my $google = WebService::Simple->new( base_url => "http://ajax.googleapis.com/ajax/services/search/image +s", param => {api_key => $key,} ); for (my $i=0; $i<$number; $i+=4) { my $hashref1 = $google->get( {v => "0.1", q => "$query", rsz => "s +mall", hl => "ja", start => "$i"} ) or die "Could not get google images: $!\n"; } }

This works in a regular perl script. But in my cgi, it fails. Putting the above code in a separate script which is called by the cgi, writes to file and allows for the cgi to read the file also fails. Is this likely to be the fault of the firewall? How do I get around it?

Many thanks for any help!

- Dime

Replies are listed 'Best First'.
Re: Google Images via CGI
by Sewi (Friar) on Sep 03, 2009 at 07:46 UTC
    The most important paragraph of your post is the following:
    This works in a regular perl script. But in my cgi, it fails. Putting the above code in a separate script which is called by the cgi, writes to file and allows for the cgi to read the file also fails.
    Whenever you experience such a situation, it's time for print (or any other command which allows you to get debug output).

    Add print STDERR lines to your program to find out what is happening where.

  • Is $google undef?
  • Is $key different?
  • What is in $hashref1 (Data::Dumper may be a good help for this)?
  • Is your program not looping for any reason?
  • If everything else fails, compare the environments (%ENV) as this is the biggest difference between a shell and a CGI situation
  • Your programm may also succeed, but the webserver cancels the request before any output gets through (usually after 5 min, but it's configurabel down to 1 sec.) If your last line print STDERR "Done\n"; isn't shown, this could be the reason

  • WebService::Simple doesn't provide a ->dump method as far as I looked at it, but using Data::Dumper's Dumper() function on it may be worth a try. Expect a huge output, but looking through it may get you an HTTP error message or something else useful.

    If everything fails and you don't get any new indeas, please add an exact description what you mean by "fails", including the error message, if any.

      My code is spiked with print statements after nearly every line, but I figured you wouldn't want to read them. $google is defined, $key is copied from the other program (which works) and is exactly identical, and $hashref1 never happens. The error message I get is

       request to http://ajax.googleapis.com/ajax/services/search/images?hl=ja&rsz=small&q=%E4%BA%BA&v=0.1&api_key=[my key]&start=0 failed at [my program] line 59

      for line 59 being

      my $hashref1 = $google->get( {v => "0.1", q => "$query", rsz => "small", hl => "ja", start => "$i"} )

      I get this error message after some two minutes or so of nothing happening, which is exactly how my programs reacted before to a forgotten proxy. That is why I assumed it was a firewall problem, again.

      %ENV is indeed very different - none of the keys seem to be the same. I don't know how that helps me, though. Please explain?

        That error message sucks because it doesn't say why it failed. Try turn on debug option, maybe you will get better message.
        Very often a
        print Dumper($google)."\n";
        helps. It usually contains the request and the reply.

        Not a Perl solution, but when debugging SOAP-crap very often

        tcpdump -nXs 8192 -c 1000 port 80 >dump.log
        also helps alot, because you get the raw data stream. You man need to be more specific with the filter conditions depending on the load of your computer.
Re: Google Images via CGI
by Anonymous Monk on Sep 03, 2009 at 07:26 UTC
    show the error message, and turn on debugging.
Re: Google Images via CGI
by dime (Novice) on Sep 07, 2009 at 02:36 UTC

    I just realized that WebSearch::Simple _is_ a kind of UserAgent, so setting the proxy is as simple as this:

    $google->proxy(...);

    *head-desk* I will be very sure to study the documentation thoroughly when next I run into problems with some unknown module. Anyway, my CGI now works as intended and I got some useful new ideas for the next time I need to debug something, so thank you all for the help. : )