in reply to (OT) indexing pdf archive content in a multiuser environment- how do i know when the content changed?

A good and reasonably fast checksum like MD5 sounds like a good idea.

But first, I would just check the files' modification times. (see -M and possibly -C) Depending on the OS and filesystem, it can be safe to assume that a file hasn't changed if either none of these have changed, at least not if you check only once a day (since the change time might only have a one second resolution).

Checking the -C and -M attributes will be a lot faster than checking the file's data contents for large files.

  • Comment on Re: (OT) indexing pdf archive content in a multiuser environment- how do i know when the content changed?

Replies are listed 'Best First'.
Re^2: (OT) indexing pdf archive content in a multiuser environment- how do i know when the content changed?
by varian (Chaplain) on May 26, 2007 at 08:12 UTC
    You probably not only want to check file modification time but also make sure that the file is not currently in use being modified.
    Usually it is sufficient to leave out files that have changed in the last minute or so. If you need to be absolutely sure that the file is not in use then use a call to *nx command lsof to list open files (needs root privs to work system wide).
Re^2: (OT) indexing pdf archive content in a multiuser environment- how do i know when the content changed?
by leocharre (Priest) on May 26, 2007 at 00:33 UTC

    That is actually very helpful to me. I can go once over and test mtime, queue those that have diff modify times and test those for md5sum.. very interesting..

    Of course there *is* a remote possiblity that an absolute path will be used twice, and their modify times would be the same- yet the data would be different. It *is* possible.

    Still.. I like. The only thing I can think of that would go around all these problems would be to mess with the guts of the filesystem itself. Which I am almost tempted to learn more about - someday..

    Thank you. This is helpful.

      The only time the mtime AND ctime will be the same but the contents has changed is if the file has changed within the same time (the same second, or whatever is the resolution the file system uses) as the last check. You can safely ignore anything that hasn't changed for 2 checks.

      update: slightly stronger wording: ctime AND mtime

        Except that it's possible for a user to intentially subvert a scheme based on such checks by using touch.