Dear Monks,

here is a small new game someone proposed : The Kember identity.
It proposes a funny challenge : programmatically find a chain whose MD5 is itself. I don't know if it's even possible, but here's my try, submit yours :)

#!/usr/bin/perl use strict; use warnings; use Digest::MD5 qw( md5_hex); for ( my $i=0; $i<340282366920938463463374607431768211455; $i++ ) { my $plain = sprintf "%032x", $i; my $digest =md5_hex($plain); #print "$i $plain $digest\n"; if ( $plain eq $digest ) { print "Winner : $plain \n"; last } if ( "$i" =~ m/.*00000$/ ) { print "$plain $digest\n"; } }

Replies are listed 'Best First'.
Re: A small game : Kember identity
by John M. Dlugosz (Monsignor) on May 11, 2009 at 20:03 UTC
    Interesting. The actual transform on a block of data such that the input is the output; i.e. x=h(x) is an identity, has no solution.

    But, he's asking for the hash of a 32-character hex string representation of the original value. That is, x=h(f(x)) where f(x) expresses the 128-bit integer as a hex string.

    I also notice that he didn't specify that f(x) is one-to-one, although his examples used lower-case letters. You could use an upper-case letter or lower-case for the same meaning in any position. So each value can have anywhere between 1 and 4,294,967,296 (inclusive) string representations. That opens up the ballpark more and increases the probability of such a value existing.

    —John

Re: A small game : Kember identity
by John M. Dlugosz (Monsignor) on May 11, 2009 at 20:11 UTC
    I think that will not work, as the large integer literal, although it does not give any warning, will be truncated to a floating-point value of approximately 3.40282366920938e+038. Likewise, your "++" will stall when the signifcand gets too large that adding 1 will not change the value at all.

    Try it with a high-precision integer package like use bigint.

      Well actually it's just a silly game : the problem space is much too large to be solvable with such a naive approach anyway. Building up something like distributed.net may work.
        But if the code is posted on their site, it should be correct. Sure it is slow, but still correct.

        Would distributing it work? Well, how fast can each one be tested? Increment directly in the string rather than converting each time. Use a lookup table, so each byte's value maps to the next, to cover upper and lower case. Basically the overhead is trivial except for one pass through the MD5 block primitive.

        But, that primitive is designed to be CPU intensive. One "F" uses 3 or 4 primitive operations, which map to CPU instructions. The accumulate step is about the same. Half the input is padding which does not change, but that doesn't help you. The beginning of the string does not change, so that does help. You can pre-load the state of the hashing to skip the first 7 of 64 operations. That still leaves 57, each of which requires about 8 machine primitives, each of which is dependant on the previous so your superscaler CPU turns into a simple one-at-a-time machine. So 500 "clocks" on a 2GHz processor is 4 million attempts per second per core.

        If you get a million cores working on the problem via friends on the net, that's 213 attempts per second. 128−13 is 115, so 2115 seconds to run to completion, or roughly 1027 years. Or, in analogy with calling our distance from the sun 1 AU, I'll call the present elapsed time since the Big Bang one UTU (Universal Time Unit). The distributed problem will terminate in 960000000000000000 UTU.

        So no, it won't help. Exponential time... doubling your effort only decrements the exponent by 1, so it's just a drop in the bucket.

        It's possible to find a collision in under a minute, due to flaws in the algorithm. So perhaps an algorithmic approach is possible, to reduce the search space. That's the only way it will get done, short of advanced quantum computers.

        —John