Yikes - I must have had a math-dyslexic moment. You are of course correct: 3.4e+38 possible results from the MD5 hash. My "theoretically" was my too-concise attempt at your "[not proven to be] equally likely to be generated".

I do have a related anecdote: I worked on a site where users uploaded video clips, supposedly original content that they personally recorded. Some of them attempted to cheat by changing the filename and uploading dupes in all but name to increase their stats.

One counter-measure I implemented was storing MD5 hashes for each clip. Generating hashes for the existing clips - over 250,000 of them - took many days. Newly uploaded clips, of course, had theirs generated on the fly.

Within days of implementing the hash check, a user complained that his clip was tagged as duplicate, but he swore he'd never uploaded it before. Turned out he was correct. It was not a hash collision; he was trying to upload a clip that someone else had previously put into the system. (IIRC, neither of them were the actual owner...)

In the months that followed, as tens of thousands more file were uploaded, every occurrence of duplicate hashes turned out to be duplicate files.

Granted, less than 500,000 versus 3.4e+38 is far from a definitive test, but I think it's safe to say that the chances of a hash collision are vanishingly remote.


In reply to Re^6: Assistance with file compare by keszler
in thread Assistance with file compare by Karger78

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.