Re: Calculating corruption

You acknowledge in your question that an encrypted file will closely simulate random noise.

If the encryption method was perfect, the (uncorrupted) file would be indistinguishable from random noise. (No encryption is perfect!)

Corruption could entail any or all of:

The inversion of a single bit somewhere in the file.
The inversion of every bit in the file.
The removal of a single byte somewhere in the file.
The removal of every byte in the file.
The insertion of a single extra byte in the file.
The insertion of any number of extra bytes within the file.
The replacement of a single byte in the file.
The replacement of every byte within the file.
Other...

Bottom line: unless a checksum of the encrypted file was generated at the same time the file was encrypted; and you know how that checksum was generated; and you have access to the checksum; and you can guarantee that the checksum could not itself have been corrupted; you might just as well be asking how to reverse time for all the possibility of getting a useful answer to your question.

Stop pursuing the solution to an impossible problem.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Calculating corruption

Replies are listed 'Best First'.
Re^2: Calculating corruption by james28909 (Deacon) on Oct 18, 2014 at 23:32 UTC
well i have been reading some more and come across standard deviation and spread. would this be a pursuable possibility? because you could compare that against all other encrypted files (even tho they use a different key) and should be a similar outcome right? std dev /should/ be moderately comparable from encrypted file to encrypted file of same data, right? or atleast within a certain range. if it is out of this certain range, then you can safely say it is more than likely corrupted right?	[reply]
Re^3: Calculating corruption by BrowserUk (Patriarch) on Oct 18, 2014 at 23:48 UTC
well i have been reading some more and come across standard deviation. would this be a pursuable possibility? because you could compare that against all other encrypted files (even tho they use a different key) and should be a similar outcome right? Why? (Why would they have a similar StdDev?) Standard Deviation measures deviation from the mean. Given a full (eg. exhaustive, but necessarily small) set of all the possible datasets of a given size; the variance (and thus StdDev) of the standard deviations, would range, and be equally distributed, between zero and infinity. Hence,the StdDev of any single sample --of anything -- means exactly nothing! That is, if the inputs are exactly 'random'; then the standard deviations are linear; and thus, completely uninformative. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: Calculating corruption by james28909 (Deacon) on Oct 19, 2014 at 00:08 UTC
i am not trying to argue with you at all so please dont take it personally :) but actually upon further examination, one of the programs i have used in the past, has this std dev function. furthermore, there are a few different revisions of this said encrypted file. each of these different revisions have an expected outcome. once you compute the files std dev and compare that with the known expected values, usually if it is within a generally close range, then that means the file is not corrupted. and i am not saying this is the end all be all of how to check a file for corruption, but somehow this other program is able to compute it and it is within a reasonably expected range... everytime... and per revision of the file, unless the file is corrupted. maybe i need to script up something real quick and just check to see what the outcome will be :)	[reply]
Re^5: Calculating corruption by BrowserUk (Patriarch) on Oct 19, 2014 at 10:34 UTC
Re^6: Calculating corruption by james28909 (Deacon) on Oct 19, 2014 at 17:57 UTC
Some notes below your chosen depth have not been shown here
Re^3: Calculating corruption by Jim (Curate) on Oct 19, 2014 at 00:08 UTC
The statistical method you describe to determine the likelihood that a stream of bytes is "corrupted" (i.e., altered in some way from its original state) will only work for a very specific kind of corruption: the kind that results in the assumed randomness of the bytes (due to encryption) being measurably reduced. If this is exactly the kind of corruption you expect and want to identify when it occurs, and you don't expect or want to identify any other kind of corruption, then the statistical method you describe may be useful to you. Let's say you have an encrypted file that consists of 1,234,567,890 bytes. One arbitrary bit of one arbitrary byte is switched from 0 to 1, or vice versa. The file is now "corrupted" (i.e., altered from its original state). You will never discover this corruption after the fact by any statistical method (guesswork).	[reply]
Re^4: Calculating corruption by james28909 (Deacon) on Oct 19, 2014 at 00:18 UTC
"You will never discover this corruption after the fact by any statistical method (guesswork)." yes sir, i competently understand that, and realise there is no way to actually tell if a encrypted file is corrupted in anyway, but you can measure certain things to help signify (to a certain extent) if the file is corrupted or partially corrupted. otherwise you would need the means to decrypt the file and checksum it like said earlier, which will not work because the file cannot be decrypted because the keys are not known and more than likely will never be known. so i am just trying to come up with some methods to check it for any possibility of being corrupt. the program i used a long time ago computed this std dev from any given file. and from each revision of this file, the std dev was always within a marginal range of the expected outcome. if it was WAY off, then you know the file was probably corrupted. that along with calculating entropy + byte for byte repetition checking + the percentage of how many times each byte character is in said file will go along way i think :)	[reply]
Re^5: Calculating corruption by CountZero (Bishop) on Oct 19, 2014 at 12:39 UTC
Re^6: Calculating corruption by james28909 (Deacon) on Oct 19, 2014 at 14:34 UTC
Some notes below your chosen depth have not been shown here
Re^5: Calculating corruption by Jim (Curate) on Oct 19, 2014 at 00:49 UTC
Re^6: Calculating corruption by james28909 (Deacon) on Oct 19, 2014 at 01:01 UTC
Some notes below your chosen depth have not been shown here
Re^5: Calculating corruption by GotToBTru (Prior) on Oct 19, 2014 at 16:01 UTC
Re^6: Calculating corruption by james28909 (Deacon) on Oct 19, 2014 at 18:06 UTC
Some notes below your chosen depth have not been shown here