Thanks. That is really intriguing. Did you look at those two files as plain text as well as via ghostscript or similar? I'm not familiar enough with postscript to understand how the 5 characters changed in the binary bit at the top can cause two otherwise identical documents to appear so different when formatted? Kind of reenforces my distaste for non-plain text communications mediums.
Even though it is rare I try to avoid programming with the view "that probably won't ever cause problems".
Agreed. The 'problem' with the MD5 hash, and all other hashes for that matter, are applications that use them under the assumption that either clashes cannot happen, or are so rare that there is no need to verify them. Especially for security/cryptography applications.
The assumption that any digest/hash function that can represent any size document of file with a short, fixed length 'unique' signature is mathematically impossible (a bit like infinite lossless compression :), and any security application that relies on that in just plain broken.
About the best you can do is compute two more different digests of the document which should make it much, much harder to generate two disperate, but meaningful documents that produce the same digests through the different hash functions.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Did you look at those two files as plain text as well as via ghostscript or similar? I'm not familiar enough with postscript to understand how the 5 characters changed in the binary bit at the top can cause two otherwise identical documents to appear so different when formatted?
I looked -- actually there are 7 bytes that differ:
byte value in value in xor
offset "letter" "order" diff
--------------------------------
0x53 97 17 80
0x6d a3 23 80
0x6e 78 79 01
0x7b 5a da 80
0x93 c8 48 80
0xad d8 58 80
0xbb 6f ef 80
Obviously there's some clue here about how md5 actually works. As for how this example was implemented, I can easily imagine putting both texts into the single .ps file, and using some binary differences at the top to toggle between ignoring (i.e. not displaying) one text or the other.
Kind of reenforces my distaste for non-plain text communications mediums.
Amen to that. | [reply] [d/l] |
blockquote>About the best you can do is compute two more different digests of the document which should make it much, much harder to generate two disperate, but meaningful documents that produce the same digests through the different hash functions.
As far as I know, most digests (MD4, MD5, SHA1...) are relying upon the same mathematical rules and concepts (such as Galois groups). As virtually all of these digest systems have be "broken", it may be conceivable to forge two different files sharing more than one digest type !
| [reply] |
it may be conceivable to forge two different files sharing more than one digest type !
Possible, but given the 'simultaneous' nature of the problem of forging a single document that matches two (or more), separate, and different checksums, means that the task gets a whole lot tougher than just a simple multiplier effect. With each change of a bit having a different effect on each of the checksums, just generating one document to match both signatures is incredibly hard. Doing so whilst creating a document that actually says something meaningful, and relevant to your nefarious means is tougher still.
The postscript example above is really a cheat. The majority of the change appears to be a "markup" change that simply conceals the major bulk of the original message--which just so happens to leave the remainder of the message suitable for the nefarious purpose.
In the absence of a carefully contrived starting point deliberately conducive to the nefarious purpose, even this "conceal the stuff that you don't want" method is really quite hard and unlikely. The fact that simply opening the document in a plain text editor shows the original content, makes it more than a little suspect as a useable technique for anything other than demonstration purposes. Interesting, but mostly irrelavent.
My challenge still stands--though not without a little trepidation. I reserve the right to change the nature of the challenge to using one MD5 wrapped around an message with an embedded digest of some other form (say SHA1 or similar), as the power of cpus and gridded networks rises, and the art of of digital forgery gets more sophisticated, but I'm not ready to wimp out yet :)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |