comment on

I am also not a crypto expert, but I have spent a little time today researching this on the web. I've cited sources so that those who are expert (or at least better readers than I) may look them up and correct any of my misunderstandings.

The reported problems with MD5 (and several other digest hashes) come from the fact that two equal length strings can be found with the same MD5 hash. Adding message length information to the digest will not eliminate the collision risk posed by hash collisions for equal length messages.

The reason why the number of bits in a digest (128 vs. 256) is emphasized so much is that the longer the hash, the less information loss via the digest process. Less information loss means fewer same length strings with the same hash and hence lower probability of meaningful collision. The security of a digest mechanism lies in its ability to hide exactly where the information loss is. When that has been uncovered, the security leaks away. (see here for a brief discussion of digest hashes and "collision resistance")

Now for the question of what counts as a "good" hash. MD5 apparently is not it.

For a message of 1024 bits it turns out that there are not one, but several pairs of same length strings with the same MD5 hash. It takes a little over an hour to find such a pair of strings, even when the message is 1024 bits long - see Collisions for Hash Functions, Wang et al, 2004. Once a first pair is found, subsequent equal length pairs can be found within 15 seconds to 5 minutes.

For a long time it was thought that there was no practical use to this collision risk (see Hash Collision Q&A). Even though it did not take a long time to compute the collision, one couldn't control which strings collided with a particular string. Finding a useful (and malicious) string that collided with a pre-existing digest was considered a minimal risk.... until very recently. In December, 2008, a paper was presented at the 25th Chaos Conference demonstrating the successful creation of a rogue CA certificate that appeared to be signed by Verisign. It took about two days using a grid of 200 Sony playstations. Whether or not the actual and rogue certificates had the same length I don't know. Details of the attack algorithm have been self-censored (on-line) by the paper's authors.

However, I suspect that the exploit did not require different length strings. Wang's research noted that the strings with the common hashes were identical in only the first half of the strings. And according to Wikipedia, the standard collision finding algorithm examines a template file containing a modifiable 128 byte block of data aligned on a 64 byte boundary. In other words, it exploits the fact that MD5 is only digesting part of the information in the file.

Two days after the Chaos conference, Microsoft issued a security advisory. Although the lead paragraphs downplay the significance of the attack, if you look in the FAQ section, you'll see that Microsoft has decided to stop using MD5 as a default signing option on its software.

Verisign , whose certificates were duped, also downplayed the risk, but only by emphasizing that it is phasing out MD5 as a hash and moving to SHA-1. However, the authors of the Chaos conference paper observe that SHA-1 has similar problems to MD5 and it is only a matter of time until it becomes vulnerable. (see also The Dangers of Cracking Hash). Due to concerns about "collision resistance", the US Government has deprecated even SHA1 with the intent of requiring all federal agencies to use only SHA2 by 2010 (see NIST hash policy).

Best, beth

In reply to Re: (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone. by ELISHEVA
in thread (OT)Speculation: 128-bit digest + 64-bit length (192-bits) is more reliable and unique than a 256-digest alone. by BrowserUk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.