sbasbasba has asked for the wisdom of the Perl Monks concerning the following question:
Hi all, thanks in advance for all the precious knowledge you've been sharing so far!
I am a newbie at Perl, and I am trying to write a script that:
1) searches Google Scholar for some keywords stored in a text file;
2) opens the first "Cited by..." link that appears in the results;
3) scrapes all the following search page (Name, info, number of citations of the papers).
This is what I wrote so far:
### #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use LWP::UserAgent; use Web::Scraper; my $mech = WWW::Mechanize->new(); $mech->get("http://scholar.google.it/scholar?hl=en&q=Handbuch+der+biol +ogischen+Arbeitsmethoden"); my $response = $mech->follow_link( url_regex => qr/cites/i, n=>1 ); + my $result = $response->decoded_content; my $indi = $mech->uri(); open (F3,'>'results.txt') or die "$!"; my $out = scraper{ process ".gs_rt", "title[]" => scraper { process ".gs_a", "info" => 'TEXT'; process ".gs_fl", "cites" => 'TEXT'; }; }; my $res = $out->scrape($result, $indi); for my $out (@{$res->{out}}) { print F3 "$out->{title} $out->{info} $out->{info}\n"; } sleep(3); close(F3);
The line:
my $res = $out->scrape($result, $indi);however, gives me the following error:
Can't locate object method "new" via package "HTML::TreeBuilder::XPath" at /System/Library/Perl/Extras/5.10.0/Web/Scraper.pm line 115, <F1> line 1.
I have searched the Internet and found no answer, I updated my version of XPath, I tried to use scrape(URI->($indi)); but nothing works. I am quite desperate! I have the feeling that there is a bug in the XPath.pm file, because I have been following exactly the same scraping code that I see in the CPAN guide for WEB::Scraper. Nothing seems to work.
If you could help me, you would have my eternal gratitude.
Thanks a lot in advance!!
|
|---|