in reply to Re^2: Fingerprinting text documents for approximate comparison
in thread Fingerprinting text documents for approximate comparison

The MD5 is only turning a list of words into a number. It is the list of words that is the fingerprint of the file. You could just compare the words. The MD5 is just being used as a checksum.
-- gam3
A picture is worth a thousand words, but takes 200K.
  • Comment on Re^3: Fingerprinting text documents for approximate comparison