Skyhawlk has asked for the wisdom of the Perl Monks concerning the following question:

HI, I need to write a code that will "monitor" file on the web and email me when someone modifies or replaces it. Here's what I had been trying so far:
use LWP::UserAgent; use CGI qw(header -no_debug); use Net::SMTP; #my $url = 'http://130.119.7.69/Autosubscriptions/URLsub/winzip.log'; my $url = 'https://myweb.server.com/URLsub/some.log'; my $res = LWP::UserAgent->new->request(HTTP::Request->new(HEAD => $url +)); my $ServerName = "mySMTP.gateway.com"; my $file="modifiedtest.txt"; # print header; local $\ = "\n"; if ($res->is_success) { # $res->previous && # $res->previous->is_redirect and print 'redirected: ', $res->reques +t->url; $res->last_modified and $modified = scalar(localtime($res->last_mod +ified)); # $res->content_length and print 'size: ', $res->content_length; } else { print $res->status_line; } #read the file content open FILE, $file or die "Couldn't open file: $!"; while (<FILE>){ $string .= $_; } close FILE; #open(DAT, $file) || die("Could not open file!"); #@data=<DAT>; print "file data is $string"; print "web file data is $modified"; #check to see if we have latest info if ($modified = $string) { print "the file is the same"; } else { print "proceeding - file is not the same..."; open(DAT,">$file") || die("Cannot Open File"); print DAT "$modified"; close(DAT); $smtp = Net::SMTP->new($ServerName); die "Couldn't connect to server" unless $smtp; my $MailFrom = "monitor\@netvision.net.il"; my $MailTo = "avishay\@mail.com"; $smtp->mail( $MailFrom ); $smtp->to( $MailTo ); # Start the mail $smtp->data(); # Send the header. $smtp->datasend("To: avishay\@mail.com\n"); $smtp->datasend("From: monitor\@netvision.net.il\n"); $smtp->datasend("Subject: There is new file out there!\n"); $smtp->datasend("\n"); # Send the message $smtp->datasend("New file (time stamped at $modified)\n\n"); # Send the termination string $smtp->dataend(); $smtp->quit(); }
Now, I can not seem to correctely compare between $modified and $string although both prints their values corretcely. What do I miss? Thanks, Sky

Replies are listed 'Best First'.
Re: Compare file content with a variable content
by mr.nick (Chaplain) on May 19, 2001 at 18:57 UTC
    I would have probably used a different approach and not relied on the modified header from the server. For tasks like these, I'm in love with MD5 ...
    #!/usr/bin/perl -w use strict; use MD5; use LWP::Simple; my $url='http://www.mrnick.binary9.net'; my $file='sitechanges.txt'; ## grab the contents my $content=get($url) || die $!; ## grab the MD5 sum from our datafile my $oldmd5=''; if (-f $file) { open(IN,"<$file") || die $!; $oldmd5=<IN>; close IN; } ## generate the new MD5 my $newmd5=MD5->hexhash($content); ## same? if ($newmd5 ne $oldmd5) { ## file has change ... do something ## .... (insert your mailing code here) ## then save the new md5 open(OUT,">$file") || die $!; print OUT $newmd5; close OUT; }
    But to answer your question, change
    if ($modified = $string) { print "the file is the same";
    to
    if ($modified eq $string) { print "the file is the same";

    Update: The size of the files you are grabbing IS very important. 18MB is quite bit for casual downloading for comparison. In that case, I would also rely on the headers returned from HEAD (including Content-Length).

      If you're dealing with small file then MD5 will be good. The problem is that these files are 12 -18MB each and I have no control on them (someone else is maintaining these files for us). Grabbing 18 MB file from the internet each time sounds like a bad idea to me - doing HTTP HEAD (I call it "giving HEAD" :-)) is more reasonable and faster.
      So you're generating an MD5 hash for your file on the URL? how does it work? Ar you automating this somehow? I'm using the header since I need the "last modified" date anyhow.
        The MD5 algorithm is described in RFC 1321 and has it's (unofficial) homepage here. RSA Labs currently 'owns' it, FAQ is here.
        It's basically a fingerprint algorithm.

        For some reason using LWP::Simple's head() doesn't work all that well for me...

        Greetz
        Beatnik
        ... Quidquid perl dictum sit, altum viditur.
Re: Compare file content with a variable content
by Beatnik (Parson) on May 19, 2001 at 19:00 UTC
    I have a similar script that uses LWP::Simple, Digest::MD5 and DBI. I basically store the page checksum in the DB, which is a pretty good method to check if the file has changed. Comparing the full page is kinda overkill IMHO.
    Update: pretty similar to mr.nick's solution above...
    Update2: I was actually considering posting this under CUFP a few days ago...
    #!/usr/bin/perl -w use strict; use DBI; use LWP::Simple; use Digest::MD5 qw(md5 md5_hex md5_base64); my $dbh = DBI->connect("dbi:mysql:database", "user", "password") || di +e "Can't connect"; my $sth = $dbh->prepare("select url from sites"); $sth->execute(); my $url = undef; $sth->bind_col(1,\$url); my @urls = (); while($sth->fetch()) { push(@urls, $url); } $sth->finish(); if (@ARGV) { my $url = $ARGV[0]; my $page = get($url); my ($title) = $page =~ /<TITLE>(.*?)<\/TITLE>/i; my $digest = md5_hex($page); my $date = time(); $title =~ s/\'/\\\'/g; my $q = join ("','",$title,$url,$digest,$date); $q = "'".$q."'"; $sth = $dbh->prepare("insert into sites values ('$title','$url','$di +gest','$date')"); $sth->execute(); } $sth->finish(); foreach my $url (@urls) { my $page = get($url); print $url,"\n"; my $digest = md5_hex($page); $sth = $dbh->prepare("select checksum from sites where url = '$url'" +); $sth->execute(); my $checksum = undef; $sth->bind_col(1,\$checksum); while($sth->fetch) { if ($checksum ne $digest) { my $date = time(); my ($title) = $page =~ /<TITLE>(.*?)<\/TITLE>/i; my $q = qq|url = "$url", name = "$title", checksum = "$digest", +date = "$date"|; my $sth2 = $dbh->prepare("update sites set $q where url = '$url' +"); $sth2->execute(); } } $sth->finish(); } $dbh->disconnect || die "Disconnection failed";

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
(ar0n: use 'eq') Re: Compare file content with a variable content
by ar0n (Priest) on May 19, 2001 at 19:00 UTC
    In
    if ($modified = $string) { print "the file is the same";
    you're assigning $string to $modified. To test wether they're equal, use the eq operator:
    if ($modified eq $string) { print "the file is the same\n"; }
    If you were to compare numbers, you'd use ==.


    Update: The answer to your question is yes, even it's just one character, the two strings will mismatch.

    ar0n ]

      I have tried both approches (either use 'eq' or just insert the scalar value of the string (array) into both vars and use '==' to compare between them, but the function always returns the "the file is not the same" printout no matter if both strings 'looks' the same. Could it be that there is a space character or CR somewhere in that file that causes to the mismatch?
        The string value was in fact surrounded by space characters so it was NOT the same as the other string. I have added some trimming:
        for ($modified_scalar) { s/^\s+//; s/\s+$//; }
        and it did the trick!