in reply to Re^2: getting content of an https website
in thread getting content of an https website

HTML::Display

https://metacpan.org/pod/WWW::Mechanize#mech-agent_alias-alias

WWW::UserAgent::Random - Perl extension to generate random User Agent / List of User-Agents (Spiders, Robots, Browser)

  • Comment on Re^3: getting content of an https website

Replies are listed 'Best First'.
Re^4: getting content of an https website
by Aldebaran (Curate) on Sep 01, 2015 at 07:54 UTC

    Thanks AM, I got pretty far with this:

    use strict; use warnings; use feature 'say'; use HTML::Display; use LWP::UserAgent; my $url = 'https://berniesanders.com/issues/racial-justice/'; my $ua = LWP::UserAgent->new(); $ua->agent( 'Windows Mozilla'); my $response = $ua->get($url); my $content = $response->content; $ENV{'PERL_HTML_DISPLAY_COMMAND'}='run "C:\Program Files (x86)\Googl +e\Chrome\Application\chrome.exe" %s'; my $browser=HTML::Display->new(); if (defined($browser)) { $browser->display(html=>$content); } else { print("Unable to open browser: $@\n"); }

    Almost everything gets displayed except the big banner on top and some stylized words at the bottom. The links with absolute urls work, but there seems to be some clunkiness in the forward and back arrows on the browser, when it comes back to the original. And what is the original? In the url it looks like this:

    file:///C:/cygwin64/tmp/9EQdRdu_5w.html

    I have trouble deciding how "real" this is at all. Tomorrow, I'll try a different site and see what happens. Thank you.