Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
And before some of you proclaim we're not allowed to scrape Google, you're wrong. If you sign up they allow X number of crawls per day and we're well within our limit on this one.
Basically we have separate $response for each offset as Google has upto 950 results per search query. We want to cut this down but I have no idea how.my $response = $ua->get("http://www.google.com/search?num=50&hl=en&lr= +&safe=off&rls=GGLD%2CGGLD%3A2005-12%2CGGLD%3Aen&q=$search"); if ($response->is_success) { &parser(0); } else { print "$response->status_line"; } my $response1 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=100&sa=N"); if ($response1->is_success) { &parser(1); } else { print "$response1->status_line"; } my $response2 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=150&sa=N"); if ($response2->is_success) { &parser(2); } else { print "$response2->status_line"; } my $response3 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=200&sa=N"); if ($response3->is_success) { &parser(3); } else { print "$response3->status_line"; } my $response4 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=250&sa=N"); if ($response4->is_success) { &parser(4); } else { print "$response4->status_line"; } my $response5 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=300&sa=N"); if ($response5->is_success) { &parser(5); } else { print "$response5->status_line"; } my $response6 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=350&sa=N"); if ($response6->is_success) { &parser(6); } else { print "$response6->status_line"; } my $response7 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=400&sa=N"); if ($response7->is_success) { &parser(7); } else { print "$response7->status_line"; } my $response8 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=450&sa=N"); if ($response8->is_success) { &parser(8); } else { print "$response8->status_line"; } my $response9 = $ua->get("http://www.google.com/search?q=$search&num=5 +0&hl=en&lr=&safe=off&start=500&sa=N"); if ($response9->is_success) { &parser(9); } else { print "$response9->status_line"; } my $response10 = $ua->get("http://www.google.com/search?q=$search&num= +50&hl=en&lr=&safe=off&start=550&sa=N"); if ($response10->is_success) { &parser(10); } else { print "$response10->status_line"; } my $response11 = $ua->get("http://www.google.com/search?q=$search&num= +50&hl=en&lr=&safe=off&start=600&sa=N"); if ($response11->is_success) { &parser(11); } else { print "$response11->status_line"; } my $response12 = $ua->get("http://www.google.com/search?q=$search&num= +50&hl=en&lr=&safe=off&start=650&sa=N"); if ($response12->is_success) { &parser(12); } else { print "$response12->status_line"; } my $response13 = $ua->get("http://www.google.com/search?q=$search&num= +50&hl=en&lr=&safe=off&start=700&sa=N"); if ($response13->is_success) { &parser(13); } else { print "$response13->status_line"; } my $response14 = $ua->get("http://www.google.com/search?q=$search&num= +50&hl=en&lr=&safe=off&start=750&sa=N"); if ($response14->is_success) { &parser(14); } else { print "$response14->status_line"; } my $response15 = $ua->get("http://www.google.com/search?q=$search&num= +50&hl=en&lr=&safe=off&start=800&sa=N"); if ($response15->is_success) { &parser(15); } else { print "$response15->status_line"; } my $response16 = $ua->get("http://www.google.com/search?q=$search&num= +50&hl=en&lr=&safe=off&start=950&sa=N"); if ($response16->is_success) { &parser(16); } else { print "$response16->status_line"; }
In the &parser the number is SHIFTed and we have this
sub parser { my $count = shift; my $google_results; if ($count eq "0") {$google_results = $response->content;} elsif ($count eq "1") {$google_results = $response1->content;} elsif ($count eq "2") {$google_results = $response2->content;} elsif ($count eq "3") {$google_results = $response3->content;} elsif ($count eq "4") {$google_results = $response4->content;} elsif ($count eq "5") {$google_results = $response5->content;} elsif ($count eq "6") {$google_results = $response6->content;} elsif ($count eq "7") {$google_results = $response7->content;} elsif ($count eq "8") {$google_results = $response8->content;} elsif ($count eq "9") {$google_results = $response9->content;} elsif ($count eq "10") {$google_results = $response10->content;} elsif ($count eq "11") {$google_results = $response11->content;} elsif ($count eq "12") {$google_results = $response12->content;} elsif ($count eq "13") {$google_results = $response13->content;} elsif ($count eq "14") {$google_results = $response14->content;} elsif ($count eq "15") {$google_results = $response15->content;} elsif ($count eq "16") {$google_results = $response16->content;}
Janitored by holli - added readmore-tag
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Making script more efficient
by dragonchild (Archbishop) on May 26, 2005 at 19:09 UTC | |
by Anonymous Monk on May 26, 2005 at 19:31 UTC | |
by dragonchild (Archbishop) on May 26, 2005 at 19:35 UTC | |
by thundergnat (Deacon) on May 26, 2005 at 21:09 UTC | |
|
Re: Making script more efficient
by Fletch (Bishop) on May 26, 2005 at 19:22 UTC | |
|
Re: Making script more efficient
by mrborisguy (Hermit) on May 26, 2005 at 19:12 UTC |