I am writing a Perl script that does a one-way synchronization of directory trees In Windows, and I am wondering what in the industry is considered "good enough" for comparison purposes.
For the record, I'd like this to be a simply clickable icon on a desktop, that can simply be emailed to people. Cygwin is not installed on our desktops so rsynch is out. Also, using any modules that aren't in core Perl is out. The above are based on my understanding, and if you know of a fairly simple way to combine those things into a single clickable sript, please educate me.
However, I would like this discussion to be more general, as far as how to weight robustness against efiiciency (speed) in a production synching program.
I think most tools work on modification time. My current tool uses full file comparision, and of course suffers in the performance department.
Is there a happy medium between full file comparison and mod time compare? To that end, two thoughts I have had are a) do most comparison by mod time, but do a full comparison on a small randomly selected subset each time (the synch will occur daily, hopefully). b) use MD5 sums.
I know I could code the first fairly well, but is it worth it? I don't know much about MD5 sums. I am working with files with average size of about 123kb, and the maximum is about 3MB. A third option would be to initially use modification time, since a later time on the server will automatically trigger a copy over to the synched media, without having to go through the comparison process. However I do think there can be false positives with the approach, if the files are generated automatically every day but the contents don't necessarily change. Also, I'm not sure what proportion of the files are actually updated on the server ever 24 hours, so I don't know if it would save much.
Code available upon request (I feel we're at the algorithm stage and not at the actual coding stage yet--again, correct me if you think I should post code regardless)
TIA,
T
UPDATE:
I understand this node has been considered. In my mind it's Perl-centric as the discussion is only about how to implement solutions in Perl; perhaps I didn't make myself clear, in which case I apologize. Please let me know if this node does not belong on PM.
_________________________________________________________________________________
I like computer programming because it's like Legos for the mind.
In reply to Best practices for file synchronization? (Mod time vs. contents compare) by OfficeLinebacker
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |