in reply to Re^4: Calculating corruption
in thread Calculating corruption

that along with calculating entropy + byte for byte repetition checking + the percentage of how many times each byte character is in said file will go along way i think :)
You seem to assume that your encrypted file is more or less like a stream of random characters and thus any "deviation" from such "randomness" indicates a corruption.

This of course is a false assumption. There is no need nor reason why an encrypted file should be anything like random noise.

Consider the unbreakable encryption of the "one time pad", or in other words, a key with a length not smaller than the message to encrypt the message. Unless you have access to the key, your encrypted file can be anything but it can never be decrypted. There is absolutely no way you can discern a properly encrypted file from a corrupted file, since actually any string of characters can mean anything. It all depends on the content of the key.

If your encryted file shows certain characteristics, the lack of which indicate corruption, then the original encryption by definition was less secure.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Replies are listed 'Best First'.
Re^6: Calculating corruption
by james28909 (Deacon) on Oct 19, 2014 at 14:34 UTC
    proper crypto would be pretty darn random if implemented right. i am no guru or anything, but i do know somethings about the cat and mouse game. the whole point of encrypting a message or a file is to make it look as a random stream of bytes as possible. proper ecdsa with a randomized key is something that is very hard to break. and from what i understand it would take multiple super computers thousands of years to break the encryption of a single file <.<

    but yeah, to me, when crypto is implemented right, the stream of bits generated are pretty random, cz it would not be a good thing to be dependably predictable ;) atleast not for this corporation anyway lol.

    and also like i said, the std dev of these files are all within a certain range and usually if they are off by 1 to 1.5%, then that usually means the file is corrupt. once i sit down and code the script to compute some statistics of said files, i will post a zip full of these files and you can try for yourself :)
      the whole point of encrypting a message or a file is to make it look as a random stream of bytes as possible.

      Actually, no. The whole point of encrypting a message is to make it so that no information about the plaintext is derivable from the encrypted file. This does not entail that every sequence of bytes is equally likely, only that every sequence of bytes that could result from an encryption (*) is equally likely given a particular expected distribution of plaintext messages which may not actually be uniform (**)

      (*) If, for example, the encrypted files always begin with a prefix specifying the encryption method, or have a checksum/hash field that always matches the rest of the ciphertext, then only certain strings are possible. (And this answers the poster's original question of how you tell there's corruption, i.e., if there's an impossible prefix or the checksum is wrong, then you know, and this is the only kind of corruption you can catch.)

      To be sure, if another of your goals is to efficiently utilize bandwidth, then it will be to your advantage to have as much of your message be random noise as possible (say, by leaving out the checksum and the prefix), depending on how important this second goal is.

      (**) And then imagine how fun things get if, say, your only possible plaintexts are "YES" and "NO" and "YES" is expected to occur 80% of the time.

      proper crypto would be pretty darn random if implemented right. i am no guru or anything, but i do know somethings about the cat and mouse game. the whole point of encrypting a message or a file is to make it look as a random stream of bytes as possible.
      I do not now why you insist that an encrypted file must look like a random stream of bytes. I can encrypt any message you like in such a way that it becomes a bible quote or one of Shakekspeare's sonnets and it will be impossible to decrypt unless you have access to the key. And I really mean "impossible" not just millions of years with millions of supercomputers.
      but yeah, to me, when crypto is implemented right, the stream of bits generated are pretty random, cz it would not be a good thing to be dependably predictable ;) at least not for this corporation anyway lol.
      You are mixing up two things "random" and "dependably predictable". Even the best generators of a stream of pseudo-random numbers (which is the best you can get unless you resort to inherently random events, such as radio-active decay or a throw of dice) provide an entirely and dependably predictable sequence of bits. And yet, it are such "dependably predictable" sequences of bits that form the heart of many good encryption schemes.
      and also like i said, the std dev of these files are all within a certain range and usually if they are off by 1 to 1.5%, then that usually means the file is corrupt. once i sit down and code the script to compute some statistics of said files, i will post a zip full of these files and you can try for yourself :)
      I am not a cryptology specialist, but if all the files encrypted show within a small margin a statistical similarity, I would surely question the strength of this system. It are such "similarities" which give cryptologists their first "breaks" to defeat the encryption. Remember Enigma?

      Of course, an encryption only has to be "good enough". In the army I have used such simple ecnryption systems for tactical messages that it would take anyone with more than 2 braincells only a few hours to decrypt it (or small Perl script, a few minutes). But when it was used for short messages only that would grow stale quickly, it was "good enough". By the time the enemy breaks the message, the information it was hiding would be next to useless to them anyhow.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics