morgon has asked for the wisdom of the Perl Monks concerning the following question:

Hi

, I am playing around with WWW::Mechanize::Chrome and would like to understand the following:

When (as an example) I do this:

use strict; use WWW::Mechanize::Chrome; my $url = "https://www.economist.com/china/2018/11/03/think-of-china-a +s-a-giant-sub-prime-lender-in-latin-america"; my $mech = WWW::Mechanize::Chrome->new(); $mech->get($url); print $_->nodeName, "\n\n", $_->get_attribute('innerHTML'), "\n\n" for + $mech->selector("html");
I get several html-documents as $mech->selector("html") does not only return one element (as I had expected) but a whole array (and I have no clue what these are, maybe iframes I don't know).

What is the proper way to only retrieve the html of the main page?

I hope my question is understandable...

Many thanks!

Replies are listed 'Best First'.
Re: get html via WWW::Mechanize::Chrome
by Corion (Patriarch) on Nov 04, 2018 at 15:54 UTC

    Maybe just use

    $mech->content

    Also, when diagnosing problems, and suspecting iframe elements, what kept you from verifying your suspicoin?

      Not knowing how to do it is what kept me.

      If I have this array returned from $mech->selector("html") - how do I filter out the iframes?

      Many thanks!

        Maybe now is a good time to learn about the "view-source" protocol (hotkey ctrl+u), which makes almost all browsers show you the HTML source of the current page. Also consider the Chrome Developer Tools which can be reached using ctrl+shift+i in Google Chrome.

        What is wrong with using ->content?

        If you have a bunch of html elements, maybe you can check ->get_attribute('ownerDocument'), but that's just a wild guess on my part.