in reply to the sands of time(in search of an optimisation)

If you're dealing with Win32/NTFS systems, then there are a couple of vastly more efficient ways to achieve your aim.

The first is to use the Change Notify mechanism (see Win32::ChangeNotify). This would involve creating a daemon or service to register your interest in changes to the file system. It runs perpectually in the background blocking on a change notification. You have it update the database as the changes occur. Very efficient, but requires a permently running process.

The second is to use Change Journaling. Basically this involves asking the system to record all filesystem changes in a journal (file). You can then periodically retrieve those changes, update your DB and reset the journal.

The following comes from (MS)Change jounals:

An automatic backup application is one example of a program that must check for changes to the state of a volume to perform its task. The brute force method of checking for changes in directories or files is to scan the entire volume. However, this is often not an acceptable approach because of the decrease in system performance it would cause. Another method is for the application to register a directory notification (by calling the FindFirstChangeNotification or ReadDirectoryChangesW functions) for the directories to be backed up. This is more efficient than the first method, however, it requires that an application be running at all times. Also, if a large number of directories and files must be backed up, the amount of processing and memory overhead for such an application might also cause the operating system's performance to decrease.

To avoid these disadvantages, the NTFS file system maintains a change journal. When any change is made to a file or directory in a volume, the change journal for that volume is updated with a description of the change and the name of the file or directory.

The first method has a *nix equivalent mechanism. And I believe some *nix filesystems are capable of the second.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re: the sands of time(in search of an optimisation)

Replies are listed 'Best First'.
Re^2: the sands of time(in search of an optimisation)
by spx2 (Deacon) on Mar 04, 2008 at 07:12 UTC
    Hello,

    Thank you.
    I have talked with some people on IRC and they talk about a similar hooking of the systems low-level
    API to be able to be notified when a file has changed.
    In particular on Linux,I think I should hook the fclose function by writing a kernel module to
    "overwrite" it a user-defined one.
        thank you very much for your suggestion.
        yours and Browser_Uk were the same ideas actually
        so I have documented myself a bit and written a daemon
        which is sitting and monitoring for changes in files.
        the following is what I cam up with(teste and works excellent):
        #!/usr/bin/perl #this lightweight daemon will run in the background and #for each new file that is modified and closed it will re-hash it and +update #its entry in the database use strict; use warnings; # #bugs encountered: # #1)does not update as expected in database the new sha1 #2)$mtime seems to be very VERY different from DateTime->now which is +weird #3)bug found in SHA1db find_or_update not finding correctly if there a +re any files in the db with #that name # # use Linux::Inotify2; use Data::Dumper; use DateTime; use SHA1db; use YAML qw/LoadFile/; $|=1; SHA1db->connect(); my $inotify = new Linux::Inotify2(); my $config_path = 'config.yml'; my $config = LoadFile($config_path); for (map {$_->{path}} @{ $config->{directories} }) { print "tracking $_\n"; $inotify->watch($_, IN_ALL_EVENTS); } while () { my @events = $inotify->read; unless (@events > 0) { print "read error: $!"; last ; }; for(@events) { next unless($_->mask & IN_CLOSE_WRITE); my $mtime =( lstat($_->fullname) )[9]; next unless $mtime; printf "updating checksum_db for file %s modified now: %s\n",$ +_->fullname,DateTime->from_epoch(epoch=>$mtime); #we should add a check to see if this file passes the regex #filter & link filter & dir filter & size filter unless(SHA1db::find_or_update($_->fullname,$mtime)) { printf "not found in db,adding...\n"; SHA1db::add_to_db(SHA1db::file2sha1($_->fullname),$mtime,- +s _ ,$_->fullname); #add it to db }; }; }; print "daemon stopped";