SergioQ has asked for the wisdom of the Perl Monks concerning the following question:

So I have searched for how to scrape webpages with a Perl script when the URL uses JavaScript, therefore LWP::Simple will not do.

The closest I've come to is finding WWW::Mechanize::Firefox yet I can't get around the "Failed to connect to , problem connecting to "localhost", port 4242" error. From what I've seen MozRepl is needed et no longer around?

All I want is to create a Perl script that runs from the command line, is given a URL and can scrape all images. What I get now however is just the thumbnails since (I'm guessing) that JavaScript does the rest after the page loads. The best example I can give is that if I go to a URL that searches the web for movie posters, what I pull down with LWP::Simple is nothing compared to what I get if I manually go to the browser and "View Source." That's where the meat all is, and am wondering if there's a workable solution for a newbie to Perl?

  • Comment on How to have Perlscript scrape images from a URL that has Javascript?

Replies are listed 'Best First'.
Re: How to have Perlscript scrape images from a URL that has Javascript?
by marto (Cardinal) on Dec 16, 2019 at 07:37 UTC

    'The closest I've come to is finding WWW::Mechanize::Firefox yet I can't get around the "Failed to connect to , problem connecting to "localhost", port 4242" error. From what I've seen MozRepl is needed et no longer around?'

    You may have missed IMPORTANT NOTICE, WWW::Mechanize::Firefox is unlikely to work unless you're using an old version of the browser. Perhaps WWW::Mechanize::Chrome is of interest.

    'What I get now however is just the thumbnails since (I'm guessing) that JavaScript does'

    Don't guess, better to be sure.

    'wondering if there's a workable solution for a newbie to Perl?'

    Depending on the target site there could be an API available. IIRC this is how the scrapers used by Kodi work. Failing that there may be code out there specifically for this site, but you don't say which site it is.

Re: How to have Perlscript scrape images from a URL that has Javascript?
by harangzsolt33 (Deacon) on Dec 16, 2019 at 07:50 UTC
    If you use Google Chrome or Firefox, you could navigate your web browser to the site where you want to scrape the images from, and enter the following line into the address bar:

    javascript:$URLS=[];for($i=0;$i<document.images.length;$i++)$URLS.push +(document.images[$i].src);document.write($URLS.join("<P>"));

    This is all one line with no line breaks and no spaces. When you paste this into the address bar, the "javascript:" prefix is going to disappear, and you'll have to type it in again BEFORE you hit enter. Once you hit enter, it's going to show a list of URLs to every image that has been loaded on the screen. This might do exactly what you're looking for, but it's not perl code. It's Javascript. ;-)

      On the plus side, it's working code. However, it requires manual interaction, which defeats the purpose (which is automation).
        However, it requires manual interaction

        Yes, that's true... And I just thought of something. Some websites don't even load all the images. You have to scroll down in order for the images to load.

        If I had to automate this process, I would download AutoIt. But that's a whole different language. And it only runs on Windows.

      This might do exactly what you're looking for, but it's not perl code. It's Javascript. ;-)
      Unlikely, OP will want to download the images, not use JavaScript to print their src attribute.