in reply to Using MD5 and the theory behind it

I'll give you an example of usage from my own real-world. I maintain a database of MP3's that I use for broadcasting. Every 2 hours a perl script (loadmusic) runs through my MP3's looking for new, updated, moved and deleted files.

To determine if an MP3 is the same, I used a Md5 checksum of the file. That way I can apply the following logic:

So, I use it to "link" files on the HD to entries in the MD5. Since the MD5 sum is unique for every file, it works as the perfect identifier (ed.).

In response to ichimunki: Absolutely correct! Of course what I meant to say was "virtually unique" :)

Replies are listed 'Best First'.
Re: Re: Using MD5 and the theory behind it
by ichimunki (Priest) on Jan 10, 2001 at 23:22 UTC
    Although I'm certain that this approach works, and will continue to work, MD5 sums are not unique for every file. If they were, this would be the ultimate compression algorithm (that is, if the MD5 were unique, you could use it to reverse engineer the file using only the hash because each hash have only one possible antecedent). The odds of two similar files having the same MD5 sum, however, is very low.
      Using one of these approximations, it looks like the probability of a birthday collision will finally hit 0.5 by about the time mr.nick has processed his 22 million million millionth MP3, so I'd agree that he has nothing to worry about for now. ;)