When I need to know if a file has changed on me (Linux), I check the file size, device, inode and modified time. I don't usually need to check files for change, so I generally code it up from scratch as required.

For a change I thought I'd wrap it up into a sub and reuse it were needed. Many eyes will hopefully catch anything I've missed. This sub takes a filename, file handle or nothing (in which case the current stored value of stat is used) and returns a unique string representing the current state of the file. The same file (handle) will continue to return the same string until the file changes.

One obvious pitfall I can think of is someone could change the file contents (keeping the same size) then reset the mod time to match the original, but I'm ignoring that. It also occurs to me that this may not be accurate where links are involved.

Note:
I'm not trying to make it work in all environments (I'm pretty sure it won't work as expected in windows).
I'm not trying to positively test that the file contents have changed, only that they 'may' have changed.
I'm also trying to avoid calling in another module to do the job.
# pass in a file name, a file handle or nothing # returns a unique string that changes when the file changes sub FileMeta { if (my $fh = shift) { my ($dev, $inode, $size, $mtime) = (stat($fh))[0,1,7,9]; return "$size:$dev;$inode:$mtime"; } else { my ($dev, $inode, $size, $mtime) = (stat(_))[0,1,7,9]; return "$size:$dev;$inode:$mtime"; } }

Replies are listed 'Best First'.
Re: Did that file change?
by davidrw (Prior) on Jun 01, 2006 at 02:01 UTC
    No need to to dupicate all the code (now to make changes it's one place, not two)..
    sub FileMeta { my $fh = shift; my ($dev, $inode, $size, $mtime) = ($fh ? stat($fh) : stat(_))[0,1,7 +,9]; return join ":", $size, $dev, $inode, $mtime; }
    (i also assumed the ';' in the return string was a typo)

    Could also do it w/o temp vars:
    sub FileMeta { my $fh = shift; # glue together the size, dev, inode, and mtime return join ":", ($fh ? stat($fh) : stat(_))[7,0,1,9]; }
    Another approach (though you noted avoiding modules) would be to MD5 the string and return that so that .. or to include in the unique string the MD5 of the contents (of course that's more overhead as well).

    Also have to toss in the obligatory mention of Tripwire for really robust monitoring/reporting (though of course a little snippet like this is handy for certain cases).
      You could also replace $fh ? stat($fh) : stat(_) with stat($fh || *_) .
        cool! I was hoping someone would provide something like that since my original attempt of stat($fh || _) didn't work; I also abandoned an ugly eval attempt.

        So how does *_ work/what does the asterisk in that context mean?
        Sweet! Thanks! Then the whole function boils down to a one liner:
        # note: order of the joined parts is not particularily important sub FileMeta { join ":", (stat(shift || *_))[0,1,7,9] }
      Nice! Thanks!

      I didn't like duping the code, but the stat(_) came as an after thought so my brain was stuck wondering how I would pass _ into the sub. Good call.

      The ';' actually wasn't a typo. It came from a previous use where it made splitting the string easier (to do what I can't remember). But in all honesty I can't think of any valid reason to do it now (or ever again). Skipping the temp variables is a nice touch too.
Re: Did that file change? (-size, +ctime)
by tye (Sage) on Jun 03, 2006 at 05:37 UTC

    File size is certainly a "feel good" item to include. But if you want a fool-proof system, then you shouldn't need to check file size. If your system might miss certain types of changes if file size weren't included, then it can certainly miss those same types of changes that also preserve file size.

    What types of changes might your system miss? The simplest is restoring mtime via utime. Recently it was proposed that simply testing mtime and ctime would be enough. But that can be fooled by renaming directories, so I would personally use dev, inode, mtime, and ctime and might leave size out of it.

    Of course, adding ctime into the picture means that "chmod" will caues the file to appear changed. Of course, you can also touch(1) the file (or quite a few other similar actions) to make the file appeared changed even though the data inside of it remains the same. So I don't mind that ctime adds a few more false positives, in part because it removes some important false negatives, but especially because it removes all of the false negatives (I believe) that are possible without root access (on common Unix file systems).

    Note that ctime is a bit different on some file systems and dev and/or inode number might be useless on some file systems. I'm disappointed and surprised that Perl doesn't provide inode numbers on Win32 -- I thought there was an equivalent available, but I don't have those details "swapped in" at the moment. So I'd probably still include size just for such systems. YMMV.

    - tye        

      I'm disappointed and surprised that Perl doesn't provide inode numbers on Win32 -- I thought there was an equivalent available, but I don't have those details "swapped in" at the moment.

      This may help.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.