jekyll has asked for the wisdom of the Perl Monks concerning the following question:

\o,

I need a script to fetch Google results (the unshortened URLs) for some automatism.

However, it looks like Google does not let me do that; not even one single result:

#!/usr/bin/perl use HTML::TreeBuilder::XPath; use HTML::StripTags qw(strip_tags); use strict; use warnings; my $search_url = "https://encrypted.google.com/search?hl=de&q=".$ARGV[ +0]; my $tree = HTML::TreeBuilder::XPath->new_from_url($search_url); my $anchor = $tree->findnodes('//h3[@class="r"]/a'); my $href = URI->new_abs($anchor->[0]->getValue, $search_url); print strip_tags($href,());
> perl moo.pl test GET failed on https://encrypted.google.com/search?hl=de&q=test: 403 Fo +rbidden at moo.pl line 10.

Why? And how can I get the actual result(s) instead?

TIA.

Regards and all that,
jkl

Replies are listed 'Best First'.
Re: Trying to fetch Google results via TreeBuilder: 403 Forbidden
by Anonymous Monk on Oct 14, 2015 at 09:08 UTC

    Why?

    Because you're not asking in the way that google likes to be asked (the headers, the request, its not what google wants)

    And how can I get the actual result(s) instead?

    Consult google documentation

    If it works from firefox, duplicate the firefox headers, yeah you may have to use WWW::Mechanize for the actual fetching instead of new_from_url

      \o,

      Google's documentation suggests me to build my own Custom Search. That's annoying.

      WWW::Mechanize works indeed, thanks. :-)

      Regards and all that,
      jkl