huguei has asked for the wisdom of the Perl Monks concerning the following question:

Hi.
I recently read in "Writing Apache Modules" book (page 308) about an exploit of MD5 hashing algorithm that allows "a malicious user from appending extra information to the end of the ticket by exploiting one of the mathematical properties of the MD5 algorithm".

The recomendation in the book is always to compute twice the MD5 hash over a sequence.

My question is : what's that exploit? AFAIK, the md5 algorithm is "collision-free", so appending extra data don't give the same signature.

Thanks for your guidance.

Huguei

Replies are listed 'Best First'.
Re: Why applying MD5 hash twice?
by hardburn (Abbot) on Sep 09, 2003 at 16:26 UTC

    Although not completely broken, there are enough problems with MD5 to make most cryptographers nervous. SHA1 is a much more robust algorithm, and has a longer hash value to boot. If you happen to be stuck using MD5, there's not much you can do, but do try to use SHA1 whenever possible.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

Re: Why applying MD5 hash twice?
by wufnik (Friar) on Sep 09, 2003 at 16:27 UTC
    some quick notes re: md5

    dobbertin 1996 was the first, as far as i know, to show common values for the *compression algorithm* inside md5. RFC1828, security considerations section, gives a high level view of this.

    in essence this would allow an attacker to exploit under *very* stringent (read peculiar) conditions digital signatures of files, for which md5 is most commonly used. still, Schneier counterpane does not seem to like it.

    so yes, you might want to md5 twice. but - the general consensus is that SHA is more secure. here is a (yet another) wee gem from Adam Back's cypherspace site which computes it, instead.

    #!/usr/bin/perl -iD9T4C`>_-JXF8NMS^$#)4=L/2X?!:@GF9;MGKH8\;O-S*8L'6 @A=unpack"N*",unpack u,$^I;@K=splice@A,5,4;sub M{($x=pop)-($m=1+~0)*in +t$x/$m}; sub L{$n=pop;($x=pop)<<$n|2**$n-1&$x>>32-$n}@F=(sub{$b&($c^$d)^$d},$S= +sub{$b^$c ^$d},sub{($b|$c)&$d|$b&$c},$S);do{$l+=$r=read STDIN,$_,64;$r++,$_.="\x +80"if$r< 64&&!$p++;@W=unpack N16,$_."\0"x7;$W[15]=$l*8 if$r<57;for(16..79){push +@W,L$W[$_ -3]^$W[$_-8]^$W[$_-14]^$W[$_-16],1}($a,$b,$c,$d,$e)=@A;for(0..79){$t=M +&{$F[$_/ 20]}+$e+$W[$_]+$K[$_/20]+L$a,5;$e=$d;$d=$c;$c=L$b,30;$b=$a;$a=$t}$v='a +';@A=map{ M$_+${$v++}}@A}while$r>56;printf'%.8x'x5 ."\n",@A
    hope that helps,

    wufnik

    ...in the world of the mules there are no rules

      Please don't ever use code like this, for several reasons:

      1. The code is obfuscated, and therefore the only point of it is to display the author's cleverness. It was not meant to be used as a library.
      2. Chances are, you don't understand it on a mere glance. Using code you don't understand can lead to "Cargo Cult Programming", which is a dangerous habit to fall into.
      3. Worse, since the code is (intentionally) difficult to read, it probably hasn't had any peer review. This code could have bugs or security holes. Encryption code that hasn't had some kind of intense peer evaluation should *NEVER EVER* be used.

      If you want to use the SHA algorithm, please use Digest::SHA2 or Digest::SHA1.

        I agree with what your saying, but to play devils advocate, 2. could apply to a lot of people who use CPAN.

        -Lee

        "To be civilized is to deny one's nature."
        I would just like to make sure we are all clear that md5 is NOT an encryption algorithm, it is a hashing algorithm, and yes there is a big difference. But I do agree with you that encryption algorithms that have not had years of review should never be used.
        my purpose in posting the code was mainly to draw attention to what i considered an ingenious piece of code, and possibly also to the cypherspace site.

        the code would not be appropriate for use in another script mostly because of it's obfuscated nature, but also because of the perl command line args that need to be used. apologies if this was unclear.

        Given this, I should say, the goal here is obviously art, sadly not mine. While i would not use the 3 line perl/bc RSA in anger, or the above SHA, or MD5 in 8 lines, more directly relevant, i still find them all a powerful demonstration of perl's beauty.

        thus the inclusion.

        ...wufnik

        -- in the world of the mules there are no rules --

      I'd need to dig through through my notes to be sure, but I used a SHA very much like this years ago. (I had to support Perl 4 installations and could not require any modules.<shrug/>)

      Unfortunately, the one I used had a bug when dealing with long strings. At a glance, I couldn't tell if this one has the problem or not. In that case, I worked with the author of the code to fix the bug and we did use it for years.

      However, as mentioned by others, using the CPAN modules would be a much better idea. Believe me, you do not want to try to find a bug in an implementation of a cryptographic hash algorithm.

      G. Wade
Re: Why applying MD5 hash twice?
by zakzebrowski (Curate) on Sep 09, 2003 at 20:03 UTC
    Unknown, but where I work, it has been deemed 'a standard algorithim', so I use it... Also, (as an aside), be sure to first check to see if the file is readable, because otherwise you'll get the md5 hash for null (d41d8c...). Also, a file with a content length of 0 will have the same hash...

    ----
    Zak
Re: Why applying MD5 hash twice?
by Abigail-II (Bishop) on Sep 09, 2003 at 15:40 UTC
    Considering that MD5 takes an arbitrary string with an arbitrary length and maps that to a 128 bit string, it's certainly not collision free.

    Having said that, your question lacks anything Perl specific. You've more chance of getting a useful answer in a more appropriate forum - md5 isn't language specific.

    Abigail

      Having said that, your question lacks anything Perl specific. You've more chance of getting a useful answer in a more appropriate forum - md5 isn't language specific.

      While not Perl specific huguei's question is most certainly Perl related. For example tons of people writing web-apps (which he refers to Apache so probably his case as well) are using MD5 for sessions. I'm sure there are tons of other uses in Perl. Besides, there's not an "MD5 monks" that I'm aware of, and while there may be some place else that has the answer there are certainly plenty of experts here (yourself included) that have it as well and I don't see why it is any less valid.

      Also the post is helpful to people like me who had no knowledge of this vulnerability

      My 2 cents

      Lobster Aliens Are attacking the world!
        The problem with that reasoning is that someone else argues along the same lines. There are tons of people writing Windows applications, so we should discuss all Windows vulnerability here. However, the original post wasn't about warning us about a newly discovered vulnerability, it has been known for years. The original post was a question: what's that exploit?.

        I fail to see how that's Perl related, or why this forum is an appropriate place to ask. The fact that there isn't an "MD5 monks" doesn't make this appropriate either. There are a billion things for which there's no "X monks", does that mean all questions about them should be asked here?

        However, while there isn't an MD5 website in the same form as perlmonks, there is a whole lot of information about MD5 readily available on the web. For instance, at the website of the developers of the MD5 algorithm, RSA (www.rsasecurity.com). They have a FAQ, which discusses MD5 - and guess what? The FAQ discusses the vulnerabilities as well.

        Abigail

      Having said that, your question lacks anything Perl specific. You've more chance of getting a useful answer in a more appropriate forum - md5 isn't language specific.

      I agree with what Abigail-II said: you do have a better chance of getting a useful answer in another forum. However, what's the opinion of the Monks about OT posts to PM? I've always felt that one of the strengths of PM was not just the opinions regarding perl, but also the opinions regarding problem solving, sw design, experiences with X, etc.

      Naturally it would be a shame to see PM flooded with non-perl noise and cease to become an excellent resource for perl programming... but does that mean we should exclude questions that aren't strictly related to the use of perl?

      Also the following thread demonstrates that people here do enjoy sounding off about non-perl topics... I dunno if that reflects that Monks are chatty folks or something else.

      Thoughts/recommendations? I recently did a writeup that was not strictly perl and was rewarded with 0 replies (as of 5pm EST yesterday), so it's certainly author beware. update: I got some replies after I went home. :) I just want to know if that kind of thing should be frowned upon (and those kinds of writeups not be approved by the Powers-That-Be).

      AH

Re: Why applying MD5 hash twice?
by Anonymous Monk on Sep 10, 2003 at 16:28 UTC
    The problem is that available iterative hash function are vulnernable to length extension attacks. MD5 and SHA-1 construct the hash by iterating over blocks of data and using the earlier hash to construct later ones. It is possible to construct a new hash and message from the original hash by appending extra data to the end of the original message. This extra data turns out to be random junk, but it can be calculated.

    One simple solution is to include the length of the message in hash computation: H(K, L, M). This protects the length from being tampered with. MD5 puts the length at the end where it is vulnerable.

    Another solution is to validate the message by parsing it. If the parsing find random junk at the end, then you know it has been tampered with. However, the important authentication data is safe.

    Finally, you can compute the hash twice. The best construction is: H(K, H(K, M)). The simplest solution is to use Digest::HMAC. This isn't expensive to compute because the second hash is done over a small amount of data.

      Thanks!
      in my case, i use data validation, parsing the message, prior to calculate the hash. I imagined that with this approach we can't receive tampered end data.

Re: Why applying MD5 hash twice?
by John M. Dlugosz (Monsignor) on Sep 10, 2003 at 20:54 UTC
    I read something in Applied Cryptography that sounds a lot like this. They point out some problem with current hash algorithms, and that it can be fixed by running the data through twice. Since that requires the entire string to be buffered, the next best thing is to hash twice (H(H(data)). Problem is, that cuts the security in half, so you should use SHA-256.