in reply to Save space with CRC
http://www.perlmonks.org/?node_id=49198
MD5 is very popular and probably suitable for this task even though it has been found to have some weaknesses which make it undesirable for security applications. SHA1 is also a reasonable choice. Since compute power is so cheap these days, why not just use both -- just concatenate the MD5 and SHA1 hashes together for a very discriminating hash!
In any case, I'd use the length of the file as the first determinant -- that will greatly reduce the amount of comparing you have to do.
Update: It has just occurred to to me that a really slick way of doing this would be to "incrementally evaluate" the hash function, so that you could limit the amount of each file that you read from disk. The hash function could really be a composite hash consisting of:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Save space with CRC
by Anonymous Monk on Dec 19, 2007 at 19:03 UTC | |
by perlfan (Parson) on Dec 19, 2007 at 20:19 UTC | |
by Anonymous Monk on Dec 20, 2007 at 00:58 UTC |