hahnfam has asked for the wisdom of the Perl Monks concerning the following question:

I have to check the broken links in the secure pages (https:\\) on a website. Could someone please tell me how I can get the html code from https web page? My code below does not work. I greatly appreciate if someone points me to a sample code. I am newbie in Perl. Thank you for your help.
#!/usr/bin/perl use LWP::UserAgent; use HTTP::Request::Common; use Crypt::SSLeay; $outfile = "output.txt"; open(OUTFILE, ">$outfile") || die("Cannot open OUTPUT file - $!"); + my $myurl "https://sharecenter.com/Pages/Software.aspx"; my $ua = new LWP::UserAgent; $ua->cookie_jar( {} ); $ua->protocols_allowed( [ 'http','https'] ); $ua->proxy(['http', 'https']); my $page = $ua->get($myurl); if ($page->is_success) { print OUTFILE $page->content; } close (OUTFILE);

Replies are listed 'Best First'.
Re: How to get html code from a secure (https:\\) page?
by wind (Priest) on Feb 16, 2011 at 00:12 UTC
    Using lwp to access ssl is documented here: lwpcook#HTTPS. The only thing that I can see from looking at your code is the website listed doesn't actually exist as an ssl server. Just use the example listed in the docs, and you'll get better error messages and hopefully be able to debug your problem:
    use LWP::UserAgent; my $ua = LWP::UserAgent->new; my $req = HTTP::Request->new(GET => 'https://encrypted.google.com/'); my $res = $ua->request($req); if ($res->is_success) { print $res->as_string; } else { print "Failed: ", $res->status_line, "\n"; }
      Thank you Win. I ran the sample code and got "500 Connect failed" message. Based upon the document that you pointed to me, I need to install the SSL interface. Do you know where I can download it? thanks again.
Re: How to get html code from a secure (https:\\) page?
by karlgoethebier (Abbot) on Jan 22, 2018 at 09:56 UTC

    When I do like this...

    #!/usr/bin/env perl use strict; use warnings; use WWW::Curl::Easy; use Data::Dump; my $fetch = sub { my $curl = WWW::Curl::Easy->new(); my ( $header, $body ); $curl->setopt( CURLOPT_URL, shift ); $curl->setopt( CURLOPT_WRITEHEADER, \$header ); $curl->setopt( CURLOPT_WRITEDATA, \$body ); $curl->setopt( CURLOPT_FOLLOWLOCATION, 1 ); $curl->setopt( CURLOPT_TIMEOUT, 10 ); $curl->setopt( CURLOPT_SSL_VERIFYPEER, 1 ); $curl->perform; { header => $header, body => $body, info => $curl->getinfo(CURLINFO_HTTP_CODE), error => $curl->errbuf, }; }; my $result = $fetch->(shift); dd $result; __END__

    ...i get:

    karls-mac-mini:playground karl$ ./curl.pl https://sharecenter.com/Page +s/Software.aspx { body => undef, error => "SSL peer certificate or SSH remote key was not OK", header => undef, info => 0, }

    ...but with $curl->setopt( CURLOPT_SSL_VERIFYPEER, 0 ); i get:

    karls-mac-mini:playground karl$ ./curl.pl https://sharecenter.com/Page +s/Software.aspx { body => "<!-- b2 -->", error => "", header => "HTTP/1.0 200 OK\r\nDate: Mon, 22 Jan 2018 09:47:51 GMT\r\ +nServer: Apache/2.2.22\r\nExpires: Mon, 26 Jul 1997 05:00:00 GMT\r\nL +ast-Modified: Mon, 22 Jan 2018 09:47:51 GMT\r\nCache-Control: no-stor +e, no-cache, must-revalidate\r\nCache-Control: post-check=0, pre-chec +k=0\r\nPragma: no-cache\r\nSet-Cookie: tu=dc6816b4e45149c7421e46e3905 +2dfef; expires=Tue, 31-Dec-2019 23:00:00 GMT; Max-Age=61218729; path= +/; domain=sharecenter.com; httponly\r\nX-Adblock-Key: MFwwDQYJKoZIhvc +NAQEBBQADSwAwSAJBANnylWw2vLY4hUn9w06zQKbhKBfvjFUCsdFlb6TdQhxb9RXWXuI4 +t31c+o8fYOv/s8q1LGPga3DE1L/tHU4LENMCAwEAAQ==_heva/qNbVoSrOKfx6K0UI/De +onTq8ke19pivgTgrL2w9ZtF3/lPIuu2AIlia5FA69jmNJzQb9Afod5WU1oglQw==\r\nV +ary: Accept-Encoding\r\nContent-Length: 11\r\nContent-Type: text/html +; charset=UTF-8\r\nX-Cache: MISS from 110132\r\nConnection: close\r\n +\r\n", info => 200, }

    I don't know if this helpful and is what you expected.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      You also get an error if you visit the site with a browser: NET::ERR_CERT_COMMON_NAME_INVALID. This is because the certificate is issued for cc.sedoparking.com which is a domain parking service. Looking at the http instead of https URL you'll see that the site is actually for sale. This suggests that the resource you want to access is no longer available under this URL.

        Mmh, as the OP wrote: "...check the broken links..."? If this is what he meant and i guessed...

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help