Re: MD5-based Unique Session ID Generator

I would think hostname is a pretty hefty operation for genarating a session id, I'm not sure but I think it does a DNS lookup.

The ID is based on hostname, time, and some psuedo-random data. I've run a test with this to generate 50,000 IDs as fast as possible and check for collisions -- I didn't get any.

I use this for session ids (which I took from one of the Apache::Session modules)

use Digest::MD5;
$session_id = substr(md5_hex(md5_hex(time() . {} . rand() . $$)), 0, 3
+2);
[download]

I ran it within the same process over 100,000 times with no collisions.

This is sort of slow, but strong. Reducing the param for rand() will speed things, but make collisions more likely.

I am no crypto expert, but from what I know, Its not really any stronger than if you didn't do it this way. Using MD5 and different text each time, it is highly unlikely that you will find a collision actually, that is just the nature of MD5 and hashing algorithms in general.

-stvn

Comment on Re: MD5-based Unique Session ID Generator Download Code

Replies are listed 'Best First'.
Re^2: MD5-based Unique Session ID Generator by pelagic (Priest) on Aug 19, 2004 at 15:09 UTC
Just the middle part of your expression`(time() . {} . rand() . $$)`helps making the session-id's unique. pelagic	[reply] [d/l]
Re^3: MD5-based Unique Session ID Generator by stvn (Monsignor) on Aug 19, 2004 at 15:29 UTC
Very true, but the double `md5_hex()` doesn't hurt (as far as I know). As I said, I am no crypto expert, and my knowledge of these things is limited. But I would think that hashing a reasonably unique string to produce a pretty darn close to unique string, and then hashing it again to get (what I would assume is) an even closer to truely unique string is a good thing when generating session ids. Please though, if I am wrong, and the double hash provides no benefit let me know why, as I would be interested in knowing. -stvn	[reply] [d/l]
Re^4: MD5-based Unique Session ID Generator by ctilmes (Vicar) on Aug 19, 2004 at 20:01 UTC
Double hashing without adding something else to it gains you nothing. If you get a collision with the first hash, you'll always get a collision with the second as well. double hash just costs you processing time.	[reply]
Re^5: MD5-based Unique Session ID Generator by ctilmes (Vicar) on Aug 20, 2004 at 12:12 UTC
Re^4: MD5-based Unique Session ID Generator by radiantmatrix (Parson) on Aug 19, 2004 at 20:37 UTC
Please though, if I am wrong, and the double hash provides no benefit let me know why, as I would be interested in knowing. It doesn't help. Here's why: the `md5_hex` of a given value will always be the same. So, if `md5_hex("hey")` is always the same, then `md5_hex(md5_hex("hey"))`, while it will be a different digest than the first, will be consistently the same. Try it yourself. If the value for the first round of `md5_hex` isn't random, no amount of repetition will create a unique value. If you were using an encryption rather than a cryptographic digest algorithm, then the extra pass may help (depending on the algo.). HTH. (BTW: I'm not a crypto expert either, but I have done some amount of research trying to better understand it. If I'm Full O' Shite™, please tell me!)	[reply]
Re^4: MD5-based Unique Session ID Generator by pelagic (Priest) on Aug 19, 2004 at 20:56 UTC
As we don't want to exercise Cargo Cult let's see what's done in this snippet: `use strict; use Digest::MD5 qw/md5_hex/; for (1..10) { my $rand_id = time() . {} . rand() . $$; my $session_00 = md5_hex($rand_id); my $session_01 = substr (md5_hex($rand_id) , 0, 32); my $session_02 = substr(md5_hex(md5_hex($rand_id)), 0, 32); printf "%s\n%s %s %s\n\n", $rand_id, $session_00, $session_01, $se +ssion_02; }` [download] "$rand_id" is composed of a couple of items to generate uniqueness: "time()", "rand()" and "$$" are good for that while "{}", ref to an anonymous hash, doesnt help much, because it's always the same. It is possible to create more than 1 session id within 1 second but it's very unlikely to get more than 1 duplicate random within 1 second. So uniqueness is achieved. It's a good idea to hash the "readable" id to put it in a regular, non human readable string format. This hashing does not improve the "uniqueness" of the id. It makes it more difficult to be guessed or hacked but that's it! To hash it a second time doesn't do anything, nor good nor bad(besides performance). pelagic	[reply] [d/l]
Re^5: MD5-based Unique Session ID Generator by stvn (Monsignor) on Aug 19, 2004 at 22:11 UTC
Re^2: MD5-based Unique Session ID Generator by radiantmatrix (Parson) on Aug 19, 2004 at 20:31 UTC
I hadn't thought of doubling the md5_hex operations -- nice tip, thank you. I am no crypto expert, but from what I know, Its not really any stronger than if you didn't do it this way. Using MD5 and different text each time, it is highly unlikely that you will find a collision actually, that is just the nature of MD5 and hashing algorithms in general. It's not MD5 use that causes issues -- it's the random data that one is hashing. If the text is always different, great -- but on systems with poor PRNG's (Win2k springs to mind), I have gotten MD5 collisions based on the fact that outputs weren't random enough - MD5 the same text twice, and you get the same digest each time. With the same algo above, except s/2345678/2345/, I had 11 collisions in 20,000 generated sessions. Not Good™. Again, though, I will have to try your much faster (and shorter) method and see if I get good results with a poor PRNG -- thanks!	[reply]
Re^3: MD5-based Unique Session ID Generator by stvn (Monsignor) on Aug 19, 2004 at 22:19 UTC
Again, though, I will have to try your much faster (and shorter) method and see if I get good results with a poor PRNG -- thanks! Just FYI, see my reply/discussion above with pelagic regarding the use of the added "{}". This bit of it may of may not provide any benefit. -stvn	[reply]