in reply to How safe is truncating an MD5 digest string?

I beleive that the nature of a good cryptographic hash function is that the output appears totally random and uniform. The probability of a match is directly related to the number of bits you keep. So, if you keep 64 bits, you have a 2**-64 probability of a match. That's a whole lot more probable than 2**-128, but still pretty rare on your scale.

a digest function that hashes uniformly into a 28-32 bit address space? Try CRC-32. Or pick n bits from MD5 or SHA1.

  • Comment on Re: How safe is truncating an MD5 digest string?

Replies are listed 'Best First'.
Re: Re: How safe is truncating an MD5 digest string?
by no_slogan (Deacon) on Sep 11, 2001 at 01:27 UTC
    CRC-32 is a bad idea. It's completely linear, which makes it easy to attack. Given a CRC of some data, it's not hard to compute the CRC of some other data that's mostly the same. Cryptographic hash functions like MD5 have nonlinear steps that make this difficult.

    The reason hash functions produce such long outputs is to resist birthday attacks. That's where someone finds two hash inputs that result in the same output. It sounds like your system won't be vulnerable to a birthday attack, though, since the users don't pick the input to the hash function - you pick it for them. I have to echo everyone else and say, "it's probably ok to shorten MD5."

    BTW, the name "birthday attack" comes from the observation that, if you walk into a room containing 20 people, it's unlikely that one of them will have the same birthday as you. However, it's fairly likely that two of them will have the same birthday as each other.

      Are you saying that given a size and a target CRC checksum other than zero, it's easy to compose a message of length size that produces the target checksum?

      Making a small change to the data, including changing one bit, should produce a totally different checksum, since that's what it was designed to do in the first place.

      —John

        Are you saying that given a size and a target CRC checksum other than zero, it's easy to compose a message of length size that produces the target checksum?
        Yes.

        Also, if you know the CRC of some data, you can calculate the CRC of "data xor something", even if you don't know what the data was!

        use String::CRC32; # given $crc == crc32($data) $crc2 = $crc ^ crc32($diff) ^ crc32("\0" x length($diff)); # now, $crc2 == crc32($data ^ $pad.$diff) # where $pad = "\0" x (length($data) - length($diff))
Re^2: How safe is truncating an MD5 digest string?
by Anonymous Monk on Apr 22, 2009 at 21:05 UTC
    Perhaps I am mistaken, but I believe some care should be taken in calculating the collision math when truncating the hash.

    See http://en.wikipedia.org/wiki/Uuid#Random_UUID_probability_of_duplicates

    If you use only 64 bits, the probability of a collision is not 1 in 2**64. Rather, it is dependent upon both the number of bits retained and the number of generated keys. For 1 million iterations, the probability of a collision is:

    P(n) = 1 - e ** -(((10**6)**2)/(2*2**64)) = 2.7 E -8

    which is much higher than 1/2**64 = 5.4 E -20

    This may not seem significant, but what happens if you select only 36 bits from the hash (9 characters)? You are nearly guaranteed a collision in 1 million iterations:

    P(n) = 1 - e ** -(((10**6)**2)/(2*2**36)) = 0.999