in reply to MD5-based Unique Session ID Generator

I would think hostname is a pretty hefty operation for genarating a session id, I'm not sure but I think it does a DNS lookup.

The ID is based on hostname, time, and some psuedo-random data. I've run a test with this to generate 50,000 IDs as fast as possible and check for collisions -- I didn't get any.

I use this for session ids (which I took from one of the Apache::Session modules)

use Digest::MD5; $session_id = substr(md5_hex(md5_hex(time() . {} . rand() . $$)), 0, 3 +2);
I ran it within the same process over 100,000 times with no collisions.

This is sort of slow, but strong. Reducing the param for rand() will speed things, but make collisions more likely.

I am no crypto expert, but from what I know, Its not really any stronger than if you didn't do it this way. Using MD5 and different text each time, it is highly unlikely that you will find a collision actually, that is just the nature of MD5 and hashing algorithms in general.

-stvn

Replies are listed 'Best First'.
Re^2: MD5-based Unique Session ID Generator
by pelagic (Priest) on Aug 19, 2004 at 15:09 UTC
    Just the middle part of your expression(time() . {} . rand() . $$)helps making the session-id's unique.

    pelagic

      Very true, but the double md5_hex() doesn't hurt (as far as I know).

      As I said, I am no crypto expert, and my knowledge of these things is limited. But I would think that hashing a reasonably unique string to produce a pretty darn close to unique string, and then hashing it again to get (what I would assume is) an even closer to truely unique string is a good thing when generating session ids. Please though, if I am wrong, and the double hash provides no benefit let me know why, as I would be interested in knowing.

      -stvn
        Double hashing without adding something else to it gains you nothing. If you get a collision with the first hash, you'll always get a collision with the second as well.

        double hash just costs you processing time.

        Please though, if I am wrong, and the double hash provides no benefit let me know why, as I would be interested in knowing.

        It doesn't help. Here's why: the md5_hex of a given value will always be the same. So, if md5_hex("hey") is always the same, then md5_hex(md5_hex("hey")), while it will be a different digest than the first, will be consistently the same. Try it yourself.

        If the value for the first round of md5_hex isn't random, no amount of repetition will create a unique value. If you were using an encryption rather than a cryptographic digest algorithm, then the extra pass may help (depending on the algo.).

        HTH. (BTW: I'm not a crypto expert either, but I have done some amount of research trying to better understand it. If I'm Full O' Shite™, please tell me!)

        As we don't want to exercise Cargo Cult let's see what's done in this snippet:
        use strict; use Digest::MD5 qw/md5_hex/; for (1..10) { my $rand_id = time() . {} . rand() . $$; my $session_00 = md5_hex($rand_id); my $session_01 = substr (md5_hex($rand_id) , 0, 32); my $session_02 = substr(md5_hex(md5_hex($rand_id)), 0, 32); printf "%s\n%s %s %s\n\n", $rand_id, $session_00, $session_01, $se +ssion_02; }
        "$rand_id" is composed of a couple of items to generate uniqueness:
        "time()", "rand()" and "$$" are good for that while "{}", ref to an anonymous hash, doesnt help much, because it's always the same. It is possible to create more than 1 session id within 1 second but it's very unlikely to get more than 1 duplicate random within 1 second. So uniqueness is achieved.
        It's a good idea to hash the "readable" id to put it in a regular, non human readable string format. This hashing does not improve the "uniqueness" of the id. It makes it more difficult to be guessed or hacked but that's it!
        To hash it a second time doesn't do anything, nor good nor bad(besides performance).

        pelagic
Re^2: MD5-based Unique Session ID Generator
by radiantmatrix (Parson) on Aug 19, 2004 at 20:31 UTC
    I hadn't thought of doubling the md5_hex operations -- nice tip, thank you.
    I am no crypto expert, but from what I know, Its not really any stronger than if you didn't do it this way. Using MD5 and different text each time, it is highly unlikely that you will find a collision actually, that is just the nature of MD5 and hashing algorithms in general.
    It's not MD5 use that causes issues -- it's the random data that one is hashing. If the text is always different, great -- but on systems with poor PRNG's (Win2k springs to mind), I have gotten MD5 collisions based on the fact that outputs weren't random enough - MD5 the same text twice, and you get the same digest each time. With the same algo above, except s/2345678/2345/, I had 11 collisions in 20,000 generated sessions. Not Good™.

    Again, though, I will have to try your much faster (and shorter) method and see if I get good results with a poor PRNG -- thanks!

      Again, though, I will have to try your much faster (and shorter) method and see if I get good results with a poor PRNG -- thanks!

      Just FYI, see my reply/discussion above with pelagic regarding the use of the added "{}". This bit of it may of may not provide any benefit.

      -stvn