get html via WWW::Mechanize::Chrome

morgon has asked for the wisdom of the Perl Monks concerning the following question:

, I am playing around with WWW::Mechanize::Chrome and would like to understand the following:

When (as an example) I do this:

use strict;
use WWW::Mechanize::Chrome;

my $url = "https://www.economist.com/china/2018/11/03/think-of-china-a
+s-a-giant-sub-prime-lender-in-latin-america";
my $mech = WWW::Mechanize::Chrome->new();

$mech->get($url);

print $_->nodeName, "\n\n", $_->get_attribute('innerHTML'), "\n\n" for
+ $mech->selector("html");
[download]

I get several html-documents as $mech->selector("html") does not only return one element (as I had expected) but a whole array (and I have no clue what these are, maybe iframes I don't know).

What is the proper way to only retrieve the html of the main page?

I hope my question is understandable...

Many thanks!

Comment on get html via WWW::Mechanize::Chrome Download Code

Replies are listed 'Best First'.
Re: get html via WWW::Mechanize::Chrome by Corion (Patriarch) on Nov 04, 2018 at 15:54 UTC
Maybe just use `$mech->content` [download] Also, when diagnosing problems, and suspecting `iframe` elements, what kept you from verifying your suspicoin?	[reply] [d/l] [select]
Re^2: get html via WWW::Mechanize::Chrome by morgon (Priest) on Nov 04, 2018 at 16:05 UTC
Not knowing how to do it is what kept me. If I have this array returned from $mech->selector("html") - how do I filter out the iframes? Many thanks!	[reply]
Re^3: get html via WWW::Mechanize::Chrome by Corion (Patriarch) on Nov 04, 2018 at 16:13 UTC
Maybe now is a good time to learn about the "`view-source`" protocol (hotkey `ctrl+u`), which makes almost all browsers show you the HTML source of the current page. Also consider the Chrome Developer Tools which can be reached using `ctrl+shift+i` in Google Chrome. What is wrong with using `->content`? If you have a bunch of `html` elements, maybe you can check `->get_attribute('ownerDocument')`, but that's just a wild guess on my part.	[reply] [d/l] [select]
Re^4: get html via WWW::Mechanize::Chrome by morgon (Priest) on Nov 04, 2018 at 16:19 UTC