Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

File::Dependencies - a new module looking for a good name (and more discussion)

by Corion (Patriarch)
on Apr 16, 2002 at 18:39 UTC ( [id://159585]=perlquestion: print w/replies, xml ) Need Help??

Corion has asked for the wisdom of the Perl Monks concerning the following question:

I have now a module that I think would be nice to have in many situations and thus I'm considering a release on the CPAN. The problem is, I don't know if the name I have (until now) is a good name, or if I reinvented the wheel because I didn't find a module that scratched this itch.

The module provides a simple interface to determine if a set of files changed since the last time they were checked. This is mostly intended for long running programs (in my case, a daemon process) that need to reinitialize if their configuration files, their program code or the code of any module they depend on (hence the name) changes.

The code itself is written (and even has 6 tests, which I haven't included as not to bloat this node even further), but I'd like some feedback on it, things that could be much easier, ideas for other methods of file signatures etc. - also, documentation stuff would be very appreciated, including correction of tyops.

As you most likely won't bother to delve into the module without something to whet your appetite, here are the two examples out of the synopsis. The first example is a long running process that checks from time to time if its configuration files have changed and then takes "appropriate action" :

use strict; use File::Dependencies; my $d = File::Dependencies->new(Files=>['Import.cfg','Export.cfg']); while (1) { my (@changes) = $d->changed; if (@changes) { print "$_ was changed\n" for @changes; $d->update(); }; sleep 60; };

The second example is an example of another long running process that restarts if any file of its source code (minus required files and done files) changed. What I want to know here is, if the idea of restarting a process like this is a good one to have in the documentation ...

use strict; use File::Dependencies; my $files = File::Dependencies->new(Files=>[values %INC, $0]); # We want to restart when any module was changed exec $0, @ARGV if $files->changed();

As you've now read this far, you most likely also want to have a look at the module itself - here it comes :

package File::Dependencies; #use 5.006; # shouldn't be necessary use strict; use warnings; require Exporter; our @ISA = qw(Exporter); our $VERSION = '0.01'; sub new { my ($class, %args) = @_; my $method = $args{Method} || "MD5"; my $files = $args{Files} || []; my $self = { Defaultmethod => $method, Files => {}, }; bless $self, $class; $self->addfile($_) for @$files; return $self; }; sub adddependency { my ($self,$filename,$method) = @_; $method ||= $self->{Defaultmethod}; my $signatureclass = "Dependency::Signature::$method"; $self->{Files}->{$filename} = $signatureclass->new($filename); }; sub addfile { my ($self,@files) = @_; $self->adddependency($_) for @files; }; sub update { my ($self) = @_; $_->initialize() for values %{$self->{Files}}; }; sub changed { my ($self) = @_; return map {$_->{Filename}} grep {$_->changed()} (values %{$self->{F +iles}}); }; 1; { package Dependency::Signature; # This is a case where Python would be nicer. With Python, we could +have (paraphrased) # class Dependency::Signature; # def initialize(self): # self.hash = self.identificate() # return self # def signature(self): # return MD5(self.filename) # def changed(self): # return self.hash != self.signature() # and it would work as expected, (almost) regardless of the structur +e that is returned # by self.signature(). This is some DWIMmery that I sometimes miss i +n Perl. # For now, only string comparisions are allowed. sub new { my ($class,$filename) = @_; my $self = { Filename => $filename, }; bless $self, $class; $self->initialize(); return $self; }; sub initialize { my ($self) = @_; $self->{Signature} = $self->signature(); return $self; }; sub changed { my ($self) = @_; my $currsig = $self->signature(); # FIXME: Deep comparision of the two signatures instead of equalit +y ! # And what's this about string comparisions anyway ? if ((ref $currsig) or (ref $self->{Signature})) { die "Implementation error in $self : changed() can't handle refe +rences (yet) !\n"; #return $currsig != $self->{Signature}; } else { return $currsig ne $self->{Signature}; }; }; 1; }; { package Dependency::Signature::mtime; use base 'Dependency::Signature'; sub signature { my ($self) = @_; my @stat = stat $self->{Filename} or die "Couldn't stat '$self->{F +ilename}' : $!"; return $stat[9]; }; 1; }; { package Dependency::Signature::MD5; use base 'Dependency::Signature'; use vars qw( $fallback ); BEGIN { eval "use Digest::MD5;"; if ($@) { #print "Falling back on Dependency::Signature::mtime\n"; $fallback = 1; }; }; # Fall back on simple mtime check unless MD5 is available : sub new { my ($class,$filename) = @_; if ($fallback) { return Dependency::Signature::mtime->new($filename); } else { return $class->SUPER::new($filename); }; }; sub signature { my ($self) = @_; my $result; if (-e $self->{Filename} and -r $self->{Filename}) { local *F; open F, $self->{Filename} or die "Couldn't read from file '$self +->{Filename}' : $!"; $result = Digest::MD5->new()->addfile(*F)->b64digest(); close F; }; return $result; }; 1; }; 1; __END__ =head1 NAME File::Dependencies - Perl extension for detection of changed files. =head1 SYNOPSIS use strict; use File::Dependencies; my $d = File::Dependencies->new(Files=>['Import.cfg','Export.cfg']); while (1) { my (@changes) = $d->changed; if (@changes) { print "$_ was changed\n" for @changes; $d->update(); }; sleep 60; }; Second example - a script that knows when any of its modules have chan +ged : use File::Dependencies; my $files = File::Dependencies->new(Files=>[values %INC, $0]); # We want to restart when any module was changed exec $0, @ARGV if $files->changed(); =head1 DESCRIPTION The Dependencies module is intended as a simple method for programs to + detect whether configuration files (or modules they rely on) have changed. Th +ere are currently two methods of change detection implemented, C<mtime> and C< +MD5>. The C<MD5> method will fall back to use timestamps if the C<Digest::MD +5> module cannot be loaded. =over 4 =item new %ARGS Creates a new instance. The C<%ARGS> hash has two possible keys, C<Method>, which denotes the method used for checking as default, and C<Files>, which takes an array reference to the filenames to watch. =item adddependency filename, method Adds a new file to watch. C<method> is the method (or rather, the subclass of C<Dependency::Signature>) to use to determine whether a file has changed or not. =item addfile LIST Adds a list of files to watch. The method used for watching is t1he default method as set in the constructor. =item update Updates all signatures to the current state. All pending changes are discarded. =item changed Returns a list of the filenames whose files did change since the construction or the last call to C<update> (whichever last occurred). =back =head2 Adding new methods for signatures Adding a new signature method is as simple as creating a new subclass of C<Dependency::Signature>. See C<Dependency::Signature::MD5> for a s +imple example. There is one point of lazyness in the implementation of C<Dep +endency::Signature>, the C<check> method can only compare strings instead of arbitrary stru +ctures (yes, there ARE things that are easier in Python than in Perl). =head2 EXPORT None by default. =head1 AUTHOR Max Maischein, E<lt>corion@informatik.uni-frankfurt.deE<gt> =head1 SEE ALSO L<perl>,L<Digest::MD5>. =cut
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Replies are listed 'Best First'.
Re: File::Dependencies - a new module looking for a good name (and more discussion)
by samtregar (Abbot) on Apr 16, 2002 at 19:10 UTC
    Sounds like a useful module. I thought from the name that it might be some kind of "gcc -M" replacement. I'm glad it's not!

    Comments:

      - You commented out the "use 5.006" and then proceeded to   
        use "our".  Naughty.
    
      - You could get the flexibility you like in Python by
        overloading != in your signature classes, I think.
        I don't know if it would be worth it, but it might
        make you happier with Perl.
    
      - You have "1;" at the end of every package.  This is not
        required.  What is required is a "1;" at the end of
        every file, which you also have.
    
      - Why repeat the class name in the "adddependency" method?
        I think just add() might be clearer.
    
      - The use of mixed case in parameter names runs against
        the grain of most Perl modules ("Method" and "Files").
    
      - This module might be more useful if it provided 
        save() and load() methods to save dependencies to
        a file and load them again.  That way it could be
        used by compilers and other non-long-running programs
        to track up-to-dateness.  Storable should offer an
        easy solution if you like the idea.
    

    That's all for now!

    -sam

Re: File::Dependencies - a new module looking for a good name (and more discussion)
by belg4mit (Prior) on Apr 16, 2002 at 19:01 UTC
    Other than the densification I would perform e.g.
    sub signature { my ($self) = @_; my @stat = stat $self->{Filename} or die "Couldn't stat '$self->{ +Filename}' : $!"; return $stat[9]; };
    =>
    sub signature { return -M $_[0]->{Filename} or die "Couldn't stat '$_[0]->{Filena +me}' : $!"; };
    It seems like you'd be better off using MTIME as a screen, and then if and only if MTIME is unchanged do further testing. Afterall, a stat is cheaper than a slurp and hash.

    As for the name I would think File::Modified might be more appropriate.

    UPDATE: Perhaps I should explain the logic behind determining if something is changed. Mtime in and of itself is not complete as one may touch a file node to it's initial setting after editing. Hence we try a different method to check for difference if the Mtime is not changed. If the Mtime is changed, we accept this. Although the inode may have been touched to affect this change, that is acceptable. On the whole this is not such a large performance boost (it only save you a hashing when Mtime is different) but it does (at least to me) present a more comprehensive system.

    Also if performance is of concern, or may be for the users, you may consider hashing only a portion of the file. Say the first 32K, or the first and last 16K, etc. since the Digest module reads the entire file into memory with the addfile method.

    --
    perl -pe "s/\b;([mnst])/'\1/mg"

(crazyinsomniac) Re: File::Dependencies - a new module looking for a good name (and more discussion)
by crazyinsomniac (Prior) on Apr 16, 2002 at 19:46 UTC
Re: File::Dependencies - a new module looking for a good name (and more discussion)
by Dragonfly (Priest) on Apr 17, 2002 at 00:40 UTC
    I like File::Modified (and File::Dependencies, for that matter), but File::Snapshot and File::Monitor also popped into my head for some reason.

    Seems like a very useful module, and yes, I think it's totally different than the cron-onymous monk's accusation. Nice work. :-)

      I agree that File::Dependencies maybe misleading. File::Modified, File::Changed, File::Updated all sound much more accurate to me.

      The suggestion to add Save/Load methods is something I would serously consider. That and a method to tell the time between updates would make this module much more flexible.

      Cheers,
      -Dogma

        Storing the file signatures should be easy to add, as I will most likely add a stringification routine to the File::Signature class; retrieving the signatures from a file would then mean parsing the file back in - something which would get nasty with weird filenames - here, either a small DBMS or a tied DB would be necessary to store the data, not nice, but feasible.

        Getting the difference between two signatures does not always have a meaning, for example the difference between two MD5 checksums. I could add the method to always store the timestamps, but timestamps are not always something you want to rely on - I'm thinking about NFS mounts with jumping clock times. A method to tell the time since the last update to the file-database is a responsibility of the main program and not of the module.

        perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: File::Dependencies - a new module looking for a good name (and more discussion)
by Corion (Patriarch) on Apr 22, 2002 at 21:00 UTC

    The module is now online on the CPAN under File::Modified. Most of the recommendations here have been implemented or are being looked into (for example, Struct::Compare and Test::More for implementing/stealing a deep structure comparision method), even persistence for the file signatures has half-heartedly been implemented (with no regards for weird character within the filenames).

    The released version has the complete interface specified (or so I hope, hah), what remains are some improvements with the implementation.

    Thanks to everybody for their hints.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: File::Dependencies - a new module looking for a good name (and more discussion)
by Anonymous Monk on Apr 16, 2002 at 21:22 UTC
    I would call your module reinvented::cron::make :)
      I don't think that's a very fair assesment. It is true that some crons (I can only speak of my own and Vixie cron for certain) make use of such functionality, but so could many other long running daemons. Almost anything which implements some sort of caching, e.g. a finger daemon that caches .plan s.

      Specifically where none of the caching modules on CPAN are appropriate. They are centered around IPC. This is useful for caching where the filesystem is authoritative and significant processing must be done on every file.

      --
      perl -pe "s/\b;([mnst])/'\1/mg"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://159585]
Approved by belg4mit
Front-paged by AidanLee
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-23 23:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found