in reply to Re^4: Logfile parsing across redundant files
in thread Logfile parsing across redundant files
Is using the entire line as the hash key *better* than using an MD5 digest of the line as the hash key?
Why do you think that using an MD5 hash is necessary or useful or preferrable?
Perl's hashes are very well tried and tested. Here's some reasons why I'd do it this way:
There's nothing to do, Perl's hashes simply work.
The only reason I know for not using them is that they can be memory hungry. With the tiny volumes of data you are talking about this is not a problem
Perl's hashing algorithm is way, way faster than MD5.
Collisions may be rare, but they are absolutely possible. All algorithms that rely upon the uniqueness of MD5s should incorporate mechanisms to detect those collisions, no matter how rare they are.
You'd simply be hashing the ascii or binary representation of the MD5 hash of the entire line. What could that possibly buy you?
Does that answer your question?
|
|---|