in reply to Simple Digests?
CRC is a Cyclic Redundancy Check - it is a fast, reasonably reliable way to ensure that a message was not damaged in transit.
GrandFather makes an excellent point about uniquness. Digests are very good for integrity checking, because the odds are very low that a substantially similar message will result in the same digest. However, you are reducing a message of arbitrary size down to a limited-size digest (16 bytes, for MD5). There will absolutely be collisions, it's a question of when they will occur.
Now, if you need to uniquely identify e-mail messages to use as keys, there are a few ways. One is to use an additional "pretty unique" attribute of the E-Mail message, and append that to the hash string. For example, the Message-ID: SMTP header should be a unique value anyway, but combined with a digest of the entire message, the chance of collision is essentially zero.
For example, if the Message-ID was <907068073421@smtp.yourhost.com>, and the Digest of the message was 5eb63bbbe01eeed093cb22bb8f5acdc3 (the MD5 of "hello world", if you care). Your key might be 907068073421@smtp.yourhost.com||5eb63bbbe01eeed093cb22bb8f5acdc3 -- that's pretty likely to be unique!
You could also use something like Data::UUID to associate the message with a truly unique identifier. I don't know if this would work for your application, because I don't know your requirements. I'm guessing you wish to be able to derive the key given the message? If so, than Data::UUID won't work for you.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Simple Digests?
by pileofrogs (Priest) on Mar 28, 2006 at 18:54 UTC | |
by radiantmatrix (Parson) on Mar 29, 2006 at 15:13 UTC | |
by pileofrogs (Priest) on Mar 31, 2006 at 02:20 UTC | |
by shotgunefx (Parson) on Mar 29, 2006 at 21:04 UTC | |
by pileofrogs (Priest) on Mar 31, 2006 at 02:21 UTC | |
by shotgunefx (Parson) on Apr 01, 2006 at 02:02 UTC |