in reply to Re: How safe is truncating an MD5 digest string?
in thread How safe is truncating an MD5 digest string?

CRC-32 is a bad idea. It's completely linear, which makes it easy to attack. Given a CRC of some data, it's not hard to compute the CRC of some other data that's mostly the same. Cryptographic hash functions like MD5 have nonlinear steps that make this difficult.

The reason hash functions produce such long outputs is to resist birthday attacks. That's where someone finds two hash inputs that result in the same output. It sounds like your system won't be vulnerable to a birthday attack, though, since the users don't pick the input to the hash function - you pick it for them. I have to echo everyone else and say, "it's probably ok to shorten MD5."

BTW, the name "birthday attack" comes from the observation that, if you walk into a room containing 20 people, it's unlikely that one of them will have the same birthday as you. However, it's fairly likely that two of them will have the same birthday as each other.

  • Comment on Re: Re: How safe is truncating an MD5 digest string?

Replies are listed 'Best First'.
Re: Re: Re: How safe is truncating an MD5 digest string?
by John M. Dlugosz (Monsignor) on Sep 12, 2001 at 01:02 UTC
    Are you saying that given a size and a target CRC checksum other than zero, it's easy to compose a message of length size that produces the target checksum?

    Making a small change to the data, including changing one bit, should produce a totally different checksum, since that's what it was designed to do in the first place.

    —John

      Are you saying that given a size and a target CRC checksum other than zero, it's easy to compose a message of length size that produces the target checksum?
      Yes.

      Also, if you know the CRC of some data, you can calculate the CRC of "data xor something", even if you don't know what the data was!

      use String::CRC32; # given $crc == crc32($data) $crc2 = $crc ^ crc32($diff) ^ crc32("\0" x length($diff)); # now, $crc2 == crc32($data ^ $pad.$diff) # where $pad = "\0" x (length($data) - length($diff))
        So, although it's good at finding accidental mutations to data, it's quite easy to contrive a change that produces a given checksum.

        It seems to me that for any hash function, if you only have 2**32 different fingerprints, you can find something by brute force.