Re^5: OT: Cracking hashes made easier

15 characters at 8 bits per character is 120 bits. So the hash maps 2**120 15 bit strings into a set of 2**128 hashes. Plus 2**112 14 bit strings. And so on. Assuming that the hashing algorithm does a good job of making it look like a given string goes to a random hash, the probability of a given string having a collision is about 1/2**8=1/256. (Let's ignore the relatively small adjustments from smaller strings and collisions.) If there isn't a collision, then only one reverse lookup is under 16 characters, and it is yours.

That means that the odds that you are the shortest reverse lookup are about 99.5%, which qualifies as "probably" in my books.

At 16 characters the math gets much more complicated. The problem here is that the strings could map to all of the hash values if they spread out perfectly, but they clump. There are hash values that have multiple strings hashing onto them, and those clumps mean that elsewhere there are hash values that are not mapped onto. The distribution of how many strings hash to a single value is given by the Poisson Distribution with a rate around 1.005. So about 36.6% of the hash values only have one string map onto them, which means that a similar percentage of strings can theoretically be recovered from the hash value.

At 17 character strings the number of strings so overwhelms the number of hash values that there are always collisions, usually hundreds of them.

The typeable calculation is similar, except that you lose a bit over a bit per character because most characters are not typeable. I was too lazy the first time around to work out how much over a bit per character you lose.

Comment on Re^5: OT: Cracking hashes made easier