in reply to Re: Making a base32 representation of md5
in thread Making a base32 representation of md5

Why? Taking the ordinate doesn't make the file name any more unique than not taking it. If you want safety, you would be better off using Data::UUID, which, as the docs say, will generate a UUID that "…is 128 bits long, and is guaranteed to be different from all other UUIDs/GUIDs generated until 3400 CE."

Granted, that's limited to your domain, but I still doubt it will be a serious problem. And, by the time it causes issues, you will be dead. ;-) MD5 sums can collide, as any hash algorithm can -- it's just very hard to deliberately construct two messages with the same signature that could possibly be mistaken for each other.

MD5 is not for establishing uniqueness, it's for signing data to validate that it has not changed since its first signing.

Anima Legato
.oO all things connect through the motion of the mind

  • Comment on Re^2: Making a base32 representation of md5

Replies are listed 'Best First'.
Re^3: Making a base32 representation of md5
by Tanktalus (Canon) on Mar 17, 2005 at 23:36 UTC

    The problem with the guaranteed version of Data::UUID is that you can't recreate the same UUID a second time. Which means that if you want the same file, you can't just create the Data::UUID to find out what directory it's in - you need to scan them. What is wanted here is a hashing algorithm - put in some piece of data (possibly including characters that cannot be represented on the filesystem), get a directory to store it in, and then be able to retrieve it when you pass in the same piece of data.

    I actually have an implementation of this that is ready to go on CPAN ... as soon as my manager allows me to do so.

Re^3: Making a base32 representation of md5
by holli (Abbot) on Mar 18, 2005 at 17:54 UTC
    It's because I store "unique" files and the best way to ensure and quickly check that (without a db or additional db-file) is to simply save them with the checksum as the name.

    As for the collission, I tought about that before. I think I'll add another checksum algo, SHA, to the name.
    Using two independent algorithms should save me from any collission. Then it's more likely the whole building tunnels into another universe spontanously.


    holli, /regexed monk/