in reply to MD5 Password Validation

If you're storing the digest directly in a database column, then forget the base64 representation, and just store the 16 byte binary data. You can always prettify it for debugging ex post facto.

You could then treat the digest as a 128 bit vector, and count the bits set. Statistically, for a random digest, half of them will be on, half will be off. You need to set a cut-off threshold, where you consider a digest is a fake, e.g. the ratio of on bits to off bits should never be worse than 56/72.

If someone tries to stuff "thisismypassword" in that field, bells are gonna start ringing, because at the very least, the 8th bit in each byte is not set, which already sets you up with an imbalance.

update: here's some code I hacked up to look at the problem:

#! /usr/bin/perl -w use strict; my $pw = shift || 'thisismypassword'; my $ones = 0; foreach( split //, $pw ) { my $bitmap = sprintf '%b', ord $_; $ones += ($bitmap =~ tr/1/1/); } print "$ones/128 bits set\n";

Unfortunately, this shows that it's dreadfully easy to come up with a balanced number of on/off bits. Here's another thought: look at what this produces:

foreach( split //, $pw ) { printf "%08b\n", ord $_; }

This produces something like:

01110100 01101000 01101001 01110011 01101001 01110011 01101101 01111001 01110000 01100001 01110011 01110011 01110111 01101110 01110010 01100100

Going down the columns, we would expect, as in the right most columns, to see about half ones and half zeros. This is not what we see in the leftmost columns, ergo, this cannot be an MD5 digest. What you really want to do is to run a quick statistical check on those 128 bits. Something like a Kruskal-Wallis or Mann-Whitney test (but IANAS).

--
g r i n d e r