Re^7: On showing the weakness in the MD5 digest function and getting bitten by scalar context

I recently ran a process that produced 100 million md5s of randomly generated data. I hit duplicates in that process on at least two runs.

Are you absolutely certain that the random data itself contained no duplicates? I would be very interested to see these MD5 collisions of yours.

I think that the odds of generating 2 matching pairs from 500 million is probably well within statistical norms.

Sure. If you reduce MD5 to about 30 bits.

However, if you gave me an md5, and asked me to find a plaintext that matched it, without giving me the plaintext you had used to generate it. That would be computationally infeasible. This, I belive, is what the md5 algorithm is intended to achieve.

This is known as the Preimage Problem. It's much more difficult than the Collision Problem (finding two messages with the same MD5). Cryptographic hashes are supposed to prevent someone from doing either one. You can answer "no they aren't" again if you want, but you will still be wrong.

Alternatively, take my trojan binary and the md5 from some trusted piece of code, and then tell me what bytes I need to insert into data space (and where) within that binary in order for it's md5 to match that of the trusted piece of software. That would be a vulnerability that would make me consider md5 broken.

There are more uses of MD5 than are dreamt of in your philosophy, Horatio.

Comment on Re^7: On showing the weakness in the MD5 digest function and getting bitten by scalar context

Replies are listed 'Best First'.
Re^8: On showing the weakness in the MD5 digest function and getting bitten by scalar context by Anonymous Monk on Aug 28, 2004 at 03:13 UTC
Sure. If you reduce MD5 to about 30 bits. 60 bits. My bad. Point stands.	[reply]
Re^8: On showing the weakness in the MD5 digest function and getting bitten by scalar context by BrowserUk (Patriarch) on Aug 28, 2004 at 08:34 UTC
From the RFC (which you appear to be (mis)quoting) -- my highlighting: This document describes the MD5 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA. Cryptographic hashes are supposed to prevent someone from doing either one. Nowhere in that do I see MD5 described as a "cryptographic hash"? Any application that uses a "digital signature" as a "cryptographic hash" based upon "conjectured...computational infeasibility" is a misapplication of the algorithm. If the application needs a "cryptographic hash", it should be using one. There are more uses of MD5 than are dreamt of in your philosophy, Horatio. Ah yes, my dear ~~Josephine~~ Hardy*, but how many of them are misuses? Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply]
Re^9: On showing the weakness in the MD5 digest function and getting bitten by scalar context by Anonymous Monk on Aug 29, 2004 at 15:06 UTC
Ok, you win. "Cryptographic hash" is not the formally correct terminology. I hang my head in shame. What I find interesting is that you dig up a quotation that says exactly what I've been telling you, but with different words, and use it to claim that I'm wrong. I wasn't quoting the RFC. It's interesting that they used almost the same language I did. You seem to think that your quotation says MD5 is a digital signature. It does not. It's telling you that MD5 can be used as part of a digital signature protocol. You also seem to think that the RFC is giving you a comprehensive list of all the allowable uses of MD5. It's not. It says "MD5 is part of this complete breakfast," but you don't have to eat it that way. If you want to learn what you should eat for breakfast, you have to study nutrition, not breakfast cereal advertisements.	[reply]
Re^10: On showing the weakness in the MD5 digest function and getting bitten by scalar context by Anonymous Monk on Aug 29, 2004 at 15:09 UTC
Do they say "chocolote frosted sugar bombs is part of this complete breakfast" in advertisments in the UK? Are you even in the UK? Whatever.	[reply]
Re^10: On showing the weakness in the MD5 digest function and getting bitten by scalar context by BrowserUk (Patriarch) on Aug 29, 2004 at 15:35 UTC
I didn't "dig it up", I went to the source. The same place I went to at the very beginning. What I find interesting is that you read The MD5 algorithm is intended for digital signature* applications,...* and then say: You seem to think that your quotation says MD5 is a digital signature. It does not. Simple, clear language. That people, not those that designed it, have taken It is conjectured that it is computationally infeasible to produce two messages having the same message digest... to mean It is ... <close eyes> ... computationally infeasible to produce two messages having the same message digest... And then rant and rave when it doesn't, is not the fault of the designers of MD5. Talking of UK advertising slogans. There is a really good, long running one for a range of paint products here in the UK. The catchphrase is: (Ronseal)Does exactly what it say's (it will) on the packet. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply]
Re^11: On showing the weakness in the MD5 digest function and getting bitten by scalar context by Anonymous Monk on Aug 30, 2004 at 04:49 UTC
Re^12: On showing the weakness in the MD5 digest function and getting bitten by scalar context by BrowserUk (Patriarch) on Aug 30, 2004 at 09:50 UTC
Some notes below your chosen depth have not been shown here