Re^3: MD5-based Unique Session ID Generator
by stvn (Monsignor) on Aug 19, 2004 at 15:29 UTC
|
Very true, but the double md5_hex() doesn't hurt (as far as I know).
As I said, I am no crypto expert, and my knowledge of these things is limited. But I would think that hashing a reasonably unique string to produce a pretty darn close to unique string, and then hashing it again to get (what I would assume is) an even closer to truely unique string is a good thing when generating session ids. Please though, if I am wrong, and the double hash provides no benefit let me know why, as I would be interested in knowing.
| [reply] [d/l] |
|
|
| [reply] |
|
|
I would even go as far as to say that double hashing can actually make more collisions. If you get a collision with the first hash, you absolutely will get a collision with the second hash, but if you don't get a collision with the first hash, you still have a chance of getting a collision with the second hash.
say you start with X and Y.
hash(X) = X'
hash(Y) = Y'
hash(X') = X''
hash(Y') = Y''
if X' = Y' (collision with first hash), then X'' = Y'' (collision with second hash)
if X' != Y' (no collision with first hash), then X'' may = Y'' (possible collision with second hash)
| [reply] |
|
|
Please though, if I am wrong, and the double hash provides no benefit let me know why, as I would be interested in knowing.
It doesn't help. Here's why: the md5_hex of a given value will always be the same. So, if md5_hex("hey") is always the same, then md5_hex(md5_hex("hey")), while it will be a different digest than the first, will be consistently the same. Try it yourself.
If the value for the first round of md5_hex isn't random, no amount of repetition will create a unique value. If you were using an encryption rather than a cryptographic digest algorithm, then the extra pass may help (depending on the algo.).
HTH. (BTW: I'm not a crypto expert either, but I have done some amount of research trying to better understand it. If I'm Full O' Shite™, please tell me!)
| [reply] |
|
|
As we don't want to exercise Cargo Cult let's see what's done in this snippet:
use strict;
use Digest::MD5 qw/md5_hex/;
for (1..10) {
my $rand_id = time() . {} . rand() . $$;
my $session_00 = md5_hex($rand_id);
my $session_01 = substr (md5_hex($rand_id) , 0, 32);
my $session_02 = substr(md5_hex(md5_hex($rand_id)), 0, 32);
printf "%s\n%s %s %s\n\n", $rand_id, $session_00, $session_01, $se
+ssion_02;
}
"$rand_id" is composed of a couple of items to generate uniqueness:
"time()", "rand()" and "$$" are good for that while "{}", ref to an anonymous hash, doesnt help much, because it's always the same. It is possible to create more than 1 session id within 1 second but it's very unlikely to get more than 1 duplicate random within 1 second. So uniqueness is achieved.
It's a good idea to hash the "readable" id to put it in a regular, non human readable string format. This hashing does not improve the "uniqueness" of the id. It makes it more difficult to be guessed or hacked but that's it!
To hash it a second time doesn't do anything, nor good nor bad(besides performance).
| [reply] [d/l] |
|
|
As we don't want to exercise Cargo Cult....
Guilty as charged, and I thank you for pointing these details out.
"time()", "rand()" and "$$" are good for that while "{}", ref to an anonymous hash, doesnt help much, because it's always the same.
Actually, what you are seeing with the repeating "{}" value will not always be true. It seems (from my experimentation (look ma, no Cargo Cult)), is that it seems the repeating value you were seeing was something to the effect of perl's first memory location. So on each loop through the script you were seeing the location reaped and reused, and even when I forked each time within the loop, it did the same thing too. However, if you can be sure that this is not the first (?) ref created, you get a bit more randomness to that value. See the code below (spaces added for readability.
my @rand;
for (1..10) {
# add a random number of elements to the array
push @rand => $_ for (0 .. ((rand() * 10) % 10));
my $rand_id = time() . " " . { time => time() } . " " . rand()
+ . " " . $$;
printf "%s\n", $rand_id;
}
__OUTPUT__
1092953284 HASH(0x1806be4) 0.351068406456278 15758
1092953284 HASH(0x180820c) 0.581041221829715 15758
1092953284 HASH(0x1808230) 0.936157439122312 15758
1092953284 HASH(0x1808284) 0.183180004399297 15758
1092953284 HASH(0x18082c0) 0.943342015904591 15758
1092953284 HASH(0x1808338) 0.424439000654708 15758
1092953284 HASH(0x1808350) 0.935454533284215 15758
1092953284 HASH(0x180838c) 0.771976549032949 15758
1092953284 HASH(0x1808398) 0.549340888274884 15758
1092953284 HASH(0x18083e0) 0.984217993290265 15758
Now of course, as the OP has pointed out to us, not all session generation is alike. This may not work for you if your script starts a fresh perl interpreter each time and the hash-ref always gets the same value. However, if you are in a long running process, this would seem to contribute to the initial entropy.
| [reply] [d/l] |