in reply to Best practices for file synchronization? (Mod time vs. contents compare)

A combination of modification-time and file-size comparison is pretty strong, but it won't notice all potential changes. Files can be back-dated and changes to file attributes may or may not trigger a modification-time increase.

I think you should at least try using MD5s before concluding that they're too slow. Digest::MD5 is pretty fast, in my experience.

-sam

  • Comment on Re: Best practices for file synchronization? (Mod time vs. contents compare)

Replies are listed 'Best First'.
Re^2: Best practices for file synchronization? (Mod time vs. contents compare)
by OfficeLinebacker (Chaplain) on Jun 12, 2006 at 18:00 UTC
    Thanks, I'll give MD5 a try, assuming it's available. Also, by my understanding, File::Compare stops comparing once a difference is found; as I understand it calculating an MD5 sum requires reading in the entire file. It will be interesting to see how the two bench out.

    Thanks,

    T

    _________________________________________________________________________________

    I like computer programming because it's like Legos for the mind.

      He's doing "one-way synchronization of directory trees In Windows". If that means he's doing archiving, he only needs to compute the MD5 of the archived file once. That means that MD5 requires that one file be read fully, whereas File::Compare requires two files to be read in part or in full.
        Ikegami, you raise a very interesting point. So where would I store the already-calculated MD5s from last time? In a text file of some sort?

        _________________________________________________________________________________

        I like computer programming because it's like Legos for the mind.

        Results in a bit.

        _________________________________________________________________________________

        I like computer programming because it's like Legos for the mind.

Re^2: Best practices for file synchronization? (Mod time vs. contents compare)
by OfficeLinebacker (Chaplain) on Jun 12, 2006 at 20:27 UTC
    OK, so I benched File::Comparing vs. using MD5 digests, vs. using modification times.

    After four runs, I got an average time of about 42 seconds using full comparison, about 48 seconds for MD5 (calculating for just the source, as the destination presumablyalready has one), and about 11 seconds for just modification times using stat().

    I think I am just going to go with modification time.

    Thanks,

    T.

    _________________________________________________________________________________

    I like computer programming because it's like Legos for the mind.