Interesting. The actual transform on a block of data such that the input is the output; i.e. x=h(x) is an identity, has no solution.
But, he's asking for the hash of a 32-character hex string representation of the original value. That is, x=h(f(x)) where f(x) expresses the 128-bit integer as a hex string.
I also notice that he didn't specify that f(x) is one-to-one, although his examples used lower-case letters. You could use an upper-case letter or lower-case for the same meaning in any position. So each value can have anywhere between 1 and 4,294,967,296 (inclusive) string representations. That opens up the ballpark more and increases the probability of such a value existing.
—John | [reply] |
| [reply] [d/l] |
Well actually it's just a silly game : the problem space is much too large to be solvable with such a naive approach anyway. Building up something like distributed.net may work.
| [reply] |
But if the code is posted on their site, it should be correct. Sure it is slow, but still correct.
Would distributing it work? Well, how fast can each one be tested? Increment directly in the string rather than converting each time. Use a lookup table, so each byte's value maps to the next, to cover upper and lower case. Basically the overhead is trivial except for one pass through the MD5 block primitive.
But, that primitive is designed to be CPU intensive. One "F" uses 3 or 4 primitive operations, which map to CPU instructions. The accumulate step is about the same. Half the input is padding which does not change, but that doesn't help you. The beginning of the string does not change, so that does help. You can pre-load the state of the hashing to skip the first 7 of 64 operations. That still leaves 57, each of which requires about 8 machine primitives, each of which is dependant on the previous so your superscaler CPU turns into a simple one-at-a-time machine. So 500 "clocks" on a 2GHz processor is 4 million attempts per second per core.
If you get a million cores working on the problem via friends on the net, that's 213 attempts per second. 128−13 is 115, so 2115 seconds to run to completion, or roughly 1027 years. Or, in analogy with calling our distance from the sun 1 AU, I'll call the present elapsed time since the Big Bang one UTU (Universal Time Unit). The distributed problem will terminate in 960000000000000000 UTU.
So no, it won't help. Exponential time... doubling your effort only decrements the exponent by 1, so it's just a drop in the bucket.
It's possible to find a collision in under a minute, due to flaws in the algorithm. So perhaps an algorithmic approach is possible, to reduce the search space. That's the only way it will get done, short of advanced quantum computers.
—John
| [reply] |