Re^2: Calculating corruption
by james28909 (Deacon) on Oct 18, 2014 at 23:32 UTC
|
well i have been reading some more and come across standard deviation and spread. would this be a pursuable possibility? because you could compare that against all other encrypted files (even tho they use a different key) and should be a similar outcome right?
std dev /should/ be moderately comparable from encrypted file to encrypted file of same data, right? or atleast within a certain range. if it is out of this certain range, then you can safely say it is more than likely corrupted right? | [reply] |
|
|
well i have been reading some more and come across standard deviation. would this be a pursuable possibility? because you could compare that against all other encrypted files (even tho they use a different key) and should be a similar outcome right?
Why? (Why would they have a similar StdDev?)
Standard Deviation measures deviation from the mean. Given a full (eg. exhaustive, but necessarily small) set of all the possible datasets of a given size; the variance (and thus StdDev) of the standard deviations, would range, and be equally distributed, between zero and infinity.
Hence,the StdDev of any single sample --of anything -- means exactly nothing!
That is, if the inputs are exactly 'random'; then the standard deviations are linear; and thus, completely uninformative.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
i am not trying to argue with you at all so please dont take it personally :)
but actually upon further examination, one of the programs i have used in the past, has this std dev function. furthermore, there are a few different revisions of this said encrypted file. each of these different revisions have an expected outcome. once you compute the files std dev and compare that with the known expected values, usually if it is within a generally close range, then that means the file is not corrupted. and i am not saying this is the end all be all of how to check a file for corruption, but somehow this other program is able to compute it and it is within a reasonably expected range... everytime... and per revision of the file, unless the file is corrupted. maybe i need to script up something real quick and just check to see what the outcome will be :)
| [reply] |
|
|
|
|
|
|
|
The statistical method you describe to determine the likelihood that a stream of bytes is "corrupted" (i.e., altered in some way from its original state) will only work for a very specific kind of corruption: the kind that results in the assumed randomness of the bytes (due to encryption) being measurably reduced. If this is exactly the kind of corruption you expect and want to identify when it occurs, and you don't expect or want to identify any other kind of corruption, then the statistical method you describe may be useful to you.
Let's say you have an encrypted file that consists of 1,234,567,890 bytes. One arbitrary bit of one arbitrary byte is switched from 0 to 1, or vice versa. The file is now "corrupted" (i.e., altered from its original state). You will never discover this corruption after the fact by any statistical method (guesswork).
| [reply] |
|
|
"You will never discover this corruption after the fact by any statistical method (guesswork)."
yes sir, i competently understand that, and realise there is no way to actually tell if a encrypted file is corrupted in anyway, but you can measure certain things to help signify (to a certain extent) if the file is corrupted or partially corrupted. otherwise you would need the means to decrypt the file and checksum it like said earlier, which will not work because the file cannot be decrypted because the keys are not known and more than likely will never be known. so i am just trying to come up with some methods to check it for any possibility of being corrupt.
the program i used a long time ago computed this std dev from any given file. and from each revision of this file, the std dev was always within a marginal range of the expected outcome. if it was WAY off, then you know the file was probably corrupted.
that along with calculating entropy + byte for byte repetition checking + the percentage of how many times each byte character is in said file will go along way i think :)
| [reply] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|