So with this clearer statement of your actual problem, we can see there's a statistical method you can use to determine if a collection of bytes is less random than expected. And I may be better able to help you nail down the simplest method than a smart person is precisely because I don't know mathematics or statistics very well.

In an encrypted file, each of the 256 bytes from 0 through 255 will occur about the same number of times. They won't occur the exact same number of times, of course, but they'll mostly be very close in frequency. (This is one of your stated assumptions.) You can easily measure the maximum variance from the mean of the frequencies of one or more example encrypted files. I remember learning the word "epsilon" a few years ago. I think it applies here. You compute a useful epsilon to use to determine if one or more bytes of an encrypted file occur more or less frequently than expected. Wild outliers imply corruption.

I used the word "variance" above. I think standard deviation is a measure of statistical variance. (I'm not going to google it now. I'm winging this explanation on intuition and poor memory.) I think of the epsilon I described above as being the result of computing the greatest percentage difference from the mean of the furthest outlier from the mean in a viable encrypted file. I don't know enough about standard deviation to know if it has anything to do with my naïve conception of "percentage difference from the mean." But I suspect it does.


In reply to Re^5: Calculating corruption by Jim
in thread Calculating corruption by james28909

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.