Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Its been a long time since I wrote a script in Perl. I need to know the latest Perl options to scrape HTML pages for javascript and Ajax driven sites. Is this option added in www::mechanize module or do we have some other modules for this.

I know about www::mechanize::firefox, but I need to run the script in a headless environment. Is there any headless version of www::mechanize::firefox? I somewhere read about WWW::Mechanize::NodeJs! by corion. Is that available by now? How about WWW::HtmlUnit, www::scripter and WWW::WebKit?(Sorry, I don't know much on these modules!).

I prefer light weighted options. Pls can any Monks advise me regarding this.

Best regards,

Maxwell

  • Comment on Scraping HTML from Javascript & ajax driven sites!

Replies are listed 'Best First'.
Re: Scraping HTML from Javascript & ajax driven sites!
by tobyink (Canon) on May 21, 2013 at 10:23 UTC

    If you're just grabbing data from one particular site, and don't need a generic solution that will work for all Javascript-driven sites, then there's another idea you should consider.

    Remember that the Javascript isn't magic. Study its source and figure out what it's doing. It's probably requesting data in JSON from some predictable URLs. With JSON and HTTP::Tiny you could probably make the same requests using Perl.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Scraping HTML from Javascript & ajax driven sites!
by Corion (Patriarch) on May 21, 2013 at 08:25 UTC

    As a long-term project, I plan to port the API of WWW::Mechanize::Firefox to use PhantomJS and Ghostdriver. So far, I've written only a very rough prototype.

    All my code releases are in my CPAN directory, so if a module is not there, it's highly unlikely that it is available elsewhere.

      Thx. Corion!
Re: Scraping HTML from Javascript & ajax driven sites!
by Anonymous Monk on May 21, 2013 at 08:25 UTC

    Pls can any Monks advise me regarding this.

    Probably not :) although all the things you mention have webpages where they detail the progress they make, and all you have to do is compare them :)

      Thx. I will do that! just wondered whether someone tried and found the good one.. :)