Mathematicians, I need your help.

Stated simply, I speculate that combining the length of a message with a good digest (say MD5) of that message to produce a 192-bit signature, is more reliable (eg.unique and secure), and far faster, than calculating a 256-bit(+) digest alone.

I'll save you all the primitive logic(*) that leads me to this conclusion and simply ask those versed in logic and math to:

  1. Counterpoint my assertion.
  2. Suggest practical methods of proof/verification.

Just to spice things up a little, I'll make a prediction: At some point in the near future, <a digest> + <message length> will become defacto-standard for security purposes.


(*)Update: I've been told offline that I probably shouldn't have omitted my primitive logic, so here it is. Ignoring non-digest length messages for simplicity and using MD5 as my example, though any good digest would do.

There will (on average) be one 16-byte message that maps to each of the possible MD5s. And there will be one 32-byte message that maps to each of the MD5s. And one 48-byte message that maps to each of the MD5s. And so on.

So, for messages of length 0 .. 2^64, there will be (on average) 2^59 messages that will map to each of the MD5s. But if you combine the length with the MD5, you get just 1 message, per combined 192-bit signature, over the same message space. Which reduces the chance of collisions, accidental or deliberate, to a tiny percentage relative to the MD5 alone.

And does so more effectively, and far more economically than moving to a 256-bit digest?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
  • Select or Download Code

Replies are listed 'Best First'.
Re: Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by atcroft (Abbot) on Apr 26, 2009 at 08:10 UTC

    The combination does limit an attacker to having to create a message within much less leeway by removing the chosen prefix collision from the attacker's inventory.

    As a result, the attacker thus has to find a string of exactly the same length that results in the same hash value. Whether finding that is easier than finding a vulnerability in the 256-bit algorithm being used is left as an exercise to the reader. :)

Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by ikegami (Patriarch) on Apr 26, 2009 at 08:31 UTC

    For cryptographic hashes,

    • it is infeasible to find a message that has a given hash,
    • it is infeasible to modify a message without changing its hash, and
    • it is infeasible to find two different messages with the same hash.

    Your approach isn't likely to help the second and third feature I mentioned.

    For example, the successful attacks against MD5 created two messages of the same length with the same signature. Your variation is no more secure than plain MD5 in that case.

Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by moritz (Cardinal) on Apr 26, 2009 at 09:16 UTC
    In the general case of cryptographic hashes you don't want the hash to reveal anything about your message.

    In the case where that's not important, it is a good idea to provide the length information in the protocol separately. For example when you sign a public key with PGP or gpg, both a hash of that key and its length are signed.

Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by ikegami (Patriarch) on Apr 26, 2009 at 08:39 UTC

    In my previous post, I considered what good would come of using your method. Then I got to thinking if there were any bad effects of including the length. For small messages like passwords, the harm is obvious. It would make them easier to guess. I don't know about the harm for longer messages, but an information leak of any kind can't possibly be good.

    I thought of using a salted hash of the length instead, but that wouldn't help.

Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by JavaFan (Canon) on Apr 26, 2009 at 21:47 UTC
    There will (on average) be one 16-byte message that maps to each of the possible MD5s. And there will be one 32-byte message that maps to each of the MD5s. And one 48-byte message that maps to each of the MD5s. And so on.
    I don't follow this logic. There are a possible of 2^128 possible 128bit hashes. 2^128 == 8^16, the number of possible 16-byte messages. So that indeeds maps, on average 1 message to each hash.

    But there are 8^32 possible 32 byte messages. Mapping to a possible 2^128 128bit hashes. Which means that, on average, 8^16 messages map to each hash. All with the same length.

    So, for messages of length 0 .. 2^64, there will be (on average) 2^59 messages that will map to each of the MD5s.
    I don't think you are right. Just looking at messages of 2^64 bytes, you have 8^(2^64) different messages, or 2^(3*2^64). Mapping to a space of 2^128. Which means that on average, 2^(3*2^64 - 128) messages map to the same hash. Or 2^55340232221128654720 different messages. Which is a really big number.
Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by ELISHEVA (Prior) on Apr 26, 2009 at 16:58 UTC

    I am also not a crypto expert, but I have spent a little time today researching this on the web. I've cited sources so that those who are expert (or at least better readers than I) may look them up and correct any of my misunderstandings.

    The reported problems with MD5 (and several other digest hashes) come from the fact that two equal length strings can be found with the same MD5 hash. Adding message length information to the digest will not eliminate the collision risk posed by hash collisions for equal length messages.

    The reason why the number of bits in a digest (128 vs. 256) is emphasized so much is that the longer the hash, the less information loss via the digest process. Less information loss means fewer same length strings with the same hash and hence lower probability of meaningful collision. The security of a digest mechanism lies in its ability to hide exactly where the information loss is. When that has been uncovered, the security leaks away. (see here for a brief discussion of digest hashes and "collision resistance")

    Now for the question of what counts as a "good" hash. MD5 apparently is not it.

    For a message of 1024 bits it turns out that there are not one, but several pairs of same length strings with the same MD5 hash. It takes a little over an hour to find such a pair of strings, even when the message is 1024 bits long - see Collisions for Hash Functions, Wang et al, 2004. Once a first pair is found, subsequent equal length pairs can be found within 15 seconds to 5 minutes.

    For a long time it was thought that there was no practical use to this collision risk (see Hash Collision Q&A). Even though it did not take a long time to compute the collision, one couldn't control which strings collided with a particular string. Finding a useful (and malicious) string that collided with a pre-existing digest was considered a minimal risk.... until very recently. In December, 2008, a paper was presented at the 25th Chaos Conference demonstrating the successful creation of a rogue CA certificate that appeared to be signed by Verisign. It took about two days using a grid of 200 Sony playstations. Whether or not the actual and rogue certificates had the same length I don't know. Details of the attack algorithm have been self-censored (on-line) by the paper's authors.

    However, I suspect that the exploit did not require different length strings. Wang's research noted that the strings with the common hashes were identical in only the first half of the strings. And according to Wikipedia, the standard collision finding algorithm examines a template file containing a modifiable 128 byte block of data aligned on a 64 byte boundary. In other words, it exploits the fact that MD5 is only digesting part of the information in the file.

    Two days after the Chaos conference, Microsoft issued a security advisory. Although the lead paragraphs downplay the significance of the attack, if you look in the FAQ section, you'll see that Microsoft has decided to stop using MD5 as a default signing option on its software.

    Verisign , whose certificates were duped, also downplayed the risk, but only by emphasizing that it is phasing out MD5 as a hash and moving to SHA-1. However, the authors of the Chaos conference paper observe that SHA-1 has similar problems to MD5 and it is only a matter of time until it becomes vulnerable. (see also The Dangers of Cracking Hash). Due to concerns about "collision resistance", the US Government has deprecated even SHA1 with the intent of requiring all federal agencies to use only SHA2 by 2010 (see NIST hash policy).

    Best, beth

      However, I suspect that the exploit did not require different length strings

      I saw two messages produced using the weakness, and they had the same length.

      Now for the question of what counts as a "good" hash. MD5 apparently is not it.

      Some uses would not be affected by its vulnerability. For example, password hashing would not be affected, since finding a message that has a given hash hasn't been broken. Yet. Attacks only get better.

      Due to concerns about "collision resistance", the US Government has deprecated even SHA1 with the intent of requiring all federal agencies to use only SHA2 by 2010

      I believe SHA-2 (aka SHA-224, SHA-256, SHA-384 and SHA-512) is also related to MD5 and SHA-1, but the attack against MD5 hasn't been shown to be feasible against SHA-2 yet.

      In light of these attacks and the lack of alternatives to this family of hashes, NIST launched a search to find a new hashing algorithm, to be dubbed SHA-3.

      Where SHA-2 favours 32-bit processors, SHA-3 will favour 64-bit processors.

Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by roboticus (Chancellor) on Apr 26, 2009 at 14:29 UTC
    BrowserUK:

    By telling the codebreaker how long the message is, you're greatly shrinking the search space they have to look through to find the original message. But remember, I'm not a mathmetician in real life, nor do I play one on T.V.

    ...roboticus
Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone.
by duelafn (Parson) on Apr 26, 2009 at 16:19 UTC

    I am not a cryptoanalyst but books on the subject typically assert that an N bit hash is superior to the concatenation of n-bit and m-bit hashes where n + m = N. I will try to look up more info later...

    Update: Slightly misunderstood OP (but hey, length is just a weak hash digest anyway right!). There is a lovely example involving PostScript files which display distinct letters when viewed or printed but have same md5sum and same length.

    More generally, here is a paper that proves the claim that concatenation of "different" hash digests will not increase security over a single hash digest of length equal to the concatenated length.

    Jonathan J. Hoch and Adi Shamir, Breaking the ICE - Finding Multicollisions in Iterated Concatenated and Expanded (ICE) Hash Functions, proc. FSE 2006

    Good Day,
        Dean