Hi all, thanks in advance for all the precious knowledge you've been sharing so far!

I am a newbie at Perl, and I am trying to write a script that:

1) searches Google Scholar for some keywords stored in a text file;

2) opens the first "Cited by..." link that appears in the results;

3) scrapes all the following search page (Name, info, number of citations of the papers).

This is what I wrote so far:

### #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use LWP::UserAgent; use Web::Scraper; my $mech = WWW::Mechanize->new(); $mech->get("http://scholar.google.it/scholar?hl=en&q=Handbuch+der+biol +ogischen+Arbeitsmethoden"); my $response = $mech->follow_link( url_regex => qr/cites/i, n=>1 ); + my $result = $response->decoded_content; my $indi = $mech->uri(); open (F3,'>'results.txt') or die "$!"; my $out = scraper{ process ".gs_rt", "title[]" => scraper { process ".gs_a", "info" => 'TEXT'; process ".gs_fl", "cites" => 'TEXT'; }; }; my $res = $out->scrape($result, $indi); for my $out (@{$res->{out}}) { print F3 "$out->{title} $out->{info} $out->{info}\n"; } sleep(3); close(F3);

The line:

my $res = $out->scrape($result, $indi);

however, gives me the following error:

Can't locate object method "new" via package "HTML::TreeBuilder::XPath" at /System/Library/Perl/Extras/5.10.0/Web/Scraper.pm line 115, <F1> line 1.

I have searched the Internet and found no answer, I updated my version of XPath, I tried to use scrape(URI->($indi)); but nothing works. I am quite desperate! I have the feeling that there is a bug in the XPath.pm file, because I have been following exactly the same scraping code that I see in the CPAN guide for WEB::Scraper. Nothing seems to work.

If you could help me, you would have my eternal gratitude.

Thanks a lot in advance!!


In reply to Problem in using Web::Scraper, coming from HTML::TreeBuilder::XPath by sbasbasba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.