Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

webpage update watch - used to watch event update

by Qiang (Friar)
on Jun 11, 2005 at 21:25 UTC ( [id://465850]=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info
Description: ever read something like 'come back in few days to check ...' on webpage ? I certainly do. how do u get informed when event get posted?

I write this script to watch event update on certain webpage. it keeps the page's last update time in a plain file and comparing it with the page's current update time. if the time doesn't match, send an email to informe me the update.

the hash to store the page info maybe little lame and can be factored into a config file with the help of Config::Tiny or others.

I set up this script as a cron job and run it daily. I have never missed the event i am interested. this script probably only work on static webpage.

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

# file to store the update time 
my $speaker_prev_mod_time = '/home/qiang/bin/speaker_prev_mod';
# webpage to watch
my $speaker_url = 'http://cs.senecac.on.ca/speakers/speakersFlash.html
+';

# file to store the update time 
my $cherry_prev_mod_time = '/home/qiang/bin/cherrypick_prev_mod';
# another webpage to watch
my $cherry_url = 'http://cherryavenuefarms.org/';

# update is sent to this email box.
my $email = 'your@email.com';

# build the webpage hash
my %checks = ( 'seneca speaker series' => 
                { 'time_stamp' =>   $speaker_prev_mod_time,
                  'url_to_check' => $speaker_url 
                },
               'cherry picking opening' =>
                { 'time_stamp' =>   $cherry_prev_mod_time,
                  'url_to_check' => $cherry_url }
             );

foreach (my ($k,$v)=each %checks) {
    my $prev_mod_time = &get_prev($v->{time_stamp});
    my $cur_mod_time  = &get_current($v->{url_to_check});

    if ($cur_mod_time ne $prev_mod_time) {
       &update_mod_time($cur_mod_time, $v->{time_stamp});
       my $t = 
`/bin/echo "$v->{url_to_check} modified on $cur_mod_time\n" |
            /bin/mail -s "$k page just updated" $email`;
    }
}

# get current update time for the page
sub get_current {
    my $url = shift;
    my @headers = head($url);
    scalar localtime($headers[2]);
}

# get prev update time for the page
sub get_prev {
    my $file_timestamp = shift;
    unless (-e $file_timestamp) {
        open F,">$file_timestamp" or die $!;
        close F;
        return "bla bla";
    }
    open F,$file_timestamp or die $!;
    my $mod_time = <F>;
    close F;
    return $mod_time;
}

# record the update time for the page
sub update_mod_time {
    my ($mod_time,$file_timestamp) = @_;
    open F,">$file_timestamp" or die $!;
    print F $mod_time;
    close F;
}
Replies are listed 'Best First'.
Re: webpage update watch - used to watch event
by merlyn (Sage) on Jun 11, 2005 at 21:36 UTC
    If you use LWP::Simple's mirror method, you could let your local copy maintain the timestamp for you, and then check if there's a 200 or 304 return code, to tell you if something changed or not. Would replace almost all your code with just a few lines.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      damn! indeed. merlyn++

      200 OK, 304 Not Modified

      [qiang@dev qiang]$ perl -MLWP::Simple -e '$x=mirror("http://cherryaven +uefarms.org/","tt");print $x' 200 [qiang@dev qiang]$ perl -MLWP::Simple -e '$x=mirror("http://cherryaven +uefarms.org/","tt");print $x' 304

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://465850]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (6)
As of 2024-04-26 09:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found