madsoeni has asked for the wisdom of the Perl Monks concerning the following question:
However, after a while I get stopped and the script returns 0 results, even though I change the waiting time for each request. Does anyone know where I am wrong? Thank you in advance.#!/usr/bin/perl -w use WWW::Mechanize; #activate scraper package # waiting time between observations $sleep_per_obs = 5; # www:mechanize agent my $agent = new WWW::Mechanize(onerror => undef); # Safari browser $agent->agent_alias( 'Mac Safari' ); # target file my ($target) = 'data_uk.txt'; print "Data will save to $target \n"; open ($target, '>', $target) or die ("Sorry, couldn't open $target for + writing. \n"); # term to search $term = "hello"; print "Search term is ".$term . ".\n"; for($year=2012;$year<=2014;$year++){ for($month=01;$month<=12;$month++){ for($day=01;$day<=31;$day++){ $url = "https://www.google.com/search?q=$term&hl=en&gl=uk&authuser=0&s +a=X&ei=xXJuUp6tMoLcyQGxp4GgCw&source=lnt&cr=countryUK&tbs=cdr%3A1%2Cc +d_min%3A$month%2F$day%2F$year%2Ccd_max%3A$month%2F$day%2F$year&tbm=nw +s"; # print "URL is ".$url."\n"; $agent->get($url); $content = $agent->content(); $content =~ /(\d+),*(\d*) results/; #assigns results (thousands and hundreds = $1 and $2) to variables my ($results1, $results2) = ($1, $2); if ($results2 eq "") { $combo = $results1; } else { $combo = ($results1*1000+$results2); } if ($combo eq ""){ $combo=0; } print "Number of results $day-$month-$year : $combo \n"; print $target "$day-$month-$year: $combo \n"; sleep 5; } } } close $target;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Count of articles Google News
by ww (Archbishop) on Feb 28, 2015 at 16:50 UTC | |
by madsoeni (Initiate) on Mar 01, 2015 at 11:25 UTC | |
by ww (Archbishop) on Mar 01, 2015 at 12:28 UTC | |
by madsoeni (Initiate) on Mar 01, 2015 at 18:26 UTC | |
|
Re: Count of articles Google News
by CoVAX (Beadle) on Feb 28, 2015 at 23:55 UTC | |
by LanX (Saint) on Mar 01, 2015 at 00:48 UTC | |
by madsoeni (Initiate) on Mar 01, 2015 at 11:27 UTC |