neil4636 has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I have some perl knowledge however I need help acheiving my goal... Basically I want to monitor a webpage for any changes. IE. Fetch HTML script from website and store it, then compare the script with the live website for any changes every 5 mins. The problem is... The website is Https and requires login to review the webpage I want to look at. I have username and passwork etc... What are my options? Cheers, Neil.

Replies are listed 'Best First'.
Re: Help Fetch HTML
by marto (Cardinal) on May 22, 2012 at 14:46 UTC

    You could use the WWW::Mechanize module to save the page locally, you could then compare it against the saved page from a previous run. If the page in question requires JavaScript see WWW::Mechanize::FAQ for a list of alternative modules.

Re: Help Fetch HTML
by talexb (Chancellor) on May 22, 2012 at 15:06 UTC

    My suggestion is to skip fetching the page using the excellent Mech module, and just do a HEAD on the URL, using some combination of the -i (If-Modified-Since) and the -o text (text output) options.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      This approach might not work because the vast majority of dynamically generated web pages (if the page in question is one) don't bother with the If-Modified-Since header and just return the whole page for a GET request.

Re: Help Fetch HTML
by zentara (Cardinal) on May 22, 2012 at 15:27 UTC
    If you want to stick with the LWP route, rather than WWW::Mechanize, here is how to do it.
    #!/usr/bin/perl use warnings; use strict; use HTTP::Request::Common qw(GET); use LWP::UserAgent; my $url ='https://zentara.zentara.net/~zentara/zentara1.avi'; my $filename = substr( $url, rindex( $url, "/" ) + 1 ); #print "$filename\n"; open( IN, ">$filename" ) or die $!; my $user = 'zentara'; my $pass = 'foobar'; my $expected_length; my $bytes_received = 0; my $ua = new LWP::UserAgent; $ua->protocols_allowed( [ 'https'] ); #setup request my $req = GET($url); $req->authorization_basic($user, $pass); #do it my $response = $ua->request($req); if ($response->is_error()) { printf " %s\n", $response->status_line; print "https request error!\n"; } else { my $content = $response->content(); print IN $content; } print $response->status_line; close IN; exit;

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
      Thank you for your replies. The form I need to login is at https://secure.dutysheet.com/ I think it uses cookies to store login credentials. When I use the above method by Zentara it just copies the login page to the file. Cheers, Neil.