in reply to Using MD5 and the theory behind it

MD5 (and other one-way hash functions like CRC32) are designed to take in a string and convert it to a shorter string, kind of a fingerprint of the original string. Diffrent one-way hash functions produce fingerprints of diffrent lengths. But the following criteria should hold for all good one-way hash functions:

I deal with a good bit of datacomm and file transfers. I use MD5 to identify when I have received suspect duplicate files. I keep a DB table with the MD5 values of all the files that have been transmitted to me. Whenever I get a new file, I compare its MD5 valye to those stored in the table. If the value is not in the table, I process the file and store its MD5 value in the table. If the value is in the table I set the file asside for special handling and notify an operator.

If you really want to learn about exactly how the (and other hash algorighms) work I recomend checking out Applied Cryptography by Bruce Schneier.

Replies are listed 'Best First'.
Re: Re: Using MD5 and the theory behind it
by r.joseph (Hermit) on Jan 10, 2001 at 06:43 UTC
    You say that you 'compare its MD5' value to the values in a table. How do you get an MD5 value for a file? What exactly do you mean by this process (I believe that this process is very similar to the one that I am attempting). Thanks for the help!

      For reasonable-sized files (ones that fit comfortably in system memory): load the file's contents into a perl scalar, say $foo. Then $fingerprint = md5($foo);

      If you look through the documentation you have for it, you'll get some advice on other methods; e.g. (the object-oriented versions) :

      my $file ="/file/to/hash"; my $md5 = Digest::MD5->new(); $md5->addfile($file); $md5->add("seekrit passwerd"); # not the best choice for one, but ... my $digest = $md5->digest;

      I got this straight out of the docs, more or less. HTH

      Philosophy can be made out of anything. Or less -- Jerry A. Fodor

        Small correction:
        my $file = "/file/to/hash"; my $md5 = Digest::MD5->new(); open(MD5, $file) || die "Unable to open file: $!\n"; binmode(MD5); $md5->addfile(*MD5); $md5->add("seekrit passwerd"); # tee hee my $digest = $md5->digest;
        Your original code will not work with the latest Digest::MD5, producing the error "Not a valid filehandle." I know this because I'm currently writing a utility script that uses MD5 to verify downloaded files (for the Slackware distrib, actually) and I tried it your way to no avail. =)

        'kaboo