rmperl has asked for the wisdom of the Perl Monks concerning the following question:

I had been using www::mechanize::firefox to extract hidden html from a javascript heavy website but mozrepl no longer works for more recent versions of firefox

I thought I could use www::mechanize::phantomjs in a similar way but I'm struggling to get it to find the elements and extract the inner html from xpath

The long xpath in the code below should be getting all the html in:

<div class="tn-napsTable">

This same code worked well in mechanize::firefox but when i run it with PhantomJS I get the error message 'no elements found'. Any help would be much appreciated!

use strict; use WWW::Mechanize::PhantomJS; use Data::Dumper; my $mech = WWW::Mechanize::PhantomJS->new( launch_arg => ['ghostdriver/src/main.js' ], ); $mech->get('https://www.racingpost.com/tipping/naps-table/'); sleep 5; my $cont= $mech->xpath('/html/body/div[4]/main/div/div[2]/div/div/div[ +2]/div/div[1]/div[1]/div[2]', one => 1,synchronize=>1); my $content=$cont->{innerHTML}; print $content;

Replies are listed 'Best First'.
Re: No Elements Found with WWW::Mechanize::PhantomJS
by Corion (Patriarch) on Jun 03, 2018 at 07:16 UTC

    At least when visiting the page from Firefox and right-clicking the element, I get a different XPath:

    /html/body/div[4]/main/div/div[2]/div/div/div[1]/div[1]/div[2]

    instead of your query:

    /html/body/div[4]/main/div/div[2]/div/div/div[2]/div/div[1]/div[1]/div +[2]

    But why are you using such a complex query when .tn-napsTable should work the same?

    my $cont= $mech->selector('.tn-napsTable', one => 1);

      Thanks for this. Much appreciated. I've run the code with your suggestion but no matter what I do I keep getting the following error message when I use your suggested CSS selector:

      No elements found for CSS selector '.tn-napsTable' at /usr/local/share/perl/5.22.1/WWW/Mechanize/Plugin/Selector.pm

      And I get the error message: No elements found for /html/body/div4/main/div/div2/div/div/div1/div1/div2 when I run the xpath

      Do you have any ideas as to what might be going wrong? I'm at a loss. Thank you again

        Maybe what PhantomJS sees as HTML content is different from what Firefox sees. In such cases, I usually print out what ->content returns and wonder why that is different from what my browser shows me.

        I admit that most of my development on that family of modules is focused on WWW::Mechanize::Chrome currently, but I'll try to reproduce your case once I get to an environment where I have PhantomJS installed.

        This is really weird. For me, the below program works and outputs the HTML that I somewhat expect:

        use strict; use WWW::Mechanize::PhantomJS; use Data::Dumper; my $mech = WWW::Mechanize::PhantomJS->new( launch_arg => ['ghostdriver/src/main.js' ], ); $mech->get('https://www.racingpost.com/tipping/naps-table/'); sleep 5; my $cont= $mech->selector('.tn-napsTable', one => 1 ); my $content=$cont->get_attribute('innerHTML'); print $content; __END__ <div class="tn-napsTable__header"><div class="tn-napsTable__main"><div + role="but ton" tabindex="0" class="tn-napsTable__cell tn-napsTable__cell_header +tn-napsTab le__napsTipster_header"><!-- react-text: 15 -->Today's naps / Tipster< +!-- /react -text --></div><div role="button" tabindex="0" class="tn-napsTable__ce +ll tn-naps ...

        This is with PhantomJS v 2.1.1 on Windows and WWW::Mechanize::PhantomJS 0.18.