Re^5: Calculating corruption
by CountZero (Bishop) on Oct 19, 2014 at 12:39 UTC
|
that along with calculating entropy + byte for byte repetition checking + the percentage of how many times each byte character is in said file will go along way i think :) You seem to assume that your encrypted file is more or less like a stream of random characters and thus any "deviation" from such "randomness" indicates a corruption.This of course is a false assumption. There is no need nor reason why an encrypted file should be anything like random noise. Consider the unbreakable encryption of the "one time pad", or in other words, a key with a length not smaller than the message to encrypt the message. Unless you have access to the key, your encrypted file can be anything but it can never be decrypted. There is absolutely no way you can discern a properly encrypted file from a corrupted file, since actually any string of characters can mean anything. It all depends on the content of the key. If your encryted file shows certain characteristics, the lack of which indicate corruption, then the original encryption by definition was less secure.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
|
|
proper crypto would be pretty darn random if implemented right. i am no guru or anything, but i do know somethings about the cat and mouse game. the whole point of encrypting a message or a file is to make it look as a random stream of bytes as possible. proper ecdsa with a randomized key is something that is very hard to break. and from what i understand it would take multiple super computers thousands of years to break the encryption of a single file <.<
but yeah, to me, when crypto is implemented right, the stream of bits generated are pretty random, cz it would not be a good thing to be dependably predictable ;) atleast not for this corporation anyway lol.
and also like i said, the std dev of these files are all within a certain range and usually if they are off by 1 to 1.5%, then that usually means the file is corrupt. once i sit down and code the script to compute some statistics of said files, i will post a zip full of these files and you can try for yourself :)
| [reply] |
|
|
the whole point of encrypting a message or a file is to make it look as a random stream of bytes as possible. Actually, no. The whole point of encrypting a message is to make it so that no information about the plaintext is derivable from the encrypted file. This does not entail that every sequence of bytes is equally likely, only that every sequence of bytes that could result from an encryption (*) is equally likely given a particular expected distribution of plaintext messages which may not actually be uniform (**)
(*) If, for example, the encrypted files always begin with a prefix specifying the encryption method, or have a checksum/hash field that always matches the rest of the ciphertext, then only certain strings are possible. (And this answers the poster's original question of how you tell there's corruption, i.e., if there's an impossible prefix or the checksum is wrong, then you know, and this is the only kind of corruption you can catch.)
To be sure, if another of your goals is to efficiently utilize bandwidth, then it will be to your advantage to have as much of your message be random noise as possible (say, by leaving out the checksum and the prefix), depending on how important this second goal is.
(**) And then imagine how fun things get if, say, your only possible plaintexts are "YES" and "NO" and "YES" is expected to occur 80% of the time.
| [reply] |
|
|
proper crypto would be pretty darn random if implemented right. i am no guru or anything, but i do know somethings about the cat and mouse game. the whole point of encrypting a message or a file is to make it look as a random stream of bytes as possible.
I do not now why you insist that an encrypted file must look like a random stream of bytes. I can encrypt any message you like in such a way that it becomes a bible quote or one of Shakekspeare's sonnets and it will be impossible to decrypt unless you have access to the key. And I really mean "impossible" not just millions of years with millions of supercomputers.
but yeah, to me, when crypto is implemented right, the stream of bits generated are pretty random, cz it would not be a good thing to be dependably predictable ;) at least not for this corporation anyway lol.
You are mixing up two things "random" and "dependably predictable". Even the best generators of a stream of pseudo-random numbers (which is the best you can get unless you resort to inherently random events, such as radio-active decay or a throw of dice) provide an entirely and dependably predictable sequence of bits. And yet, it are such "dependably predictable" sequences of bits that form the heart of many good encryption schemes.
and also like i said, the std dev of these files are all within a certain range and usually if they are off by 1 to 1.5%, then that usually means the file is corrupt. once i sit down and code the script to compute some statistics of said files, i will post a zip full of these files and you can try for yourself :) I am not a cryptology specialist, but if all the files encrypted show within a small margin a statistical similarity, I would surely question the strength of this system. It are such "similarities" which give cryptologists their first "breaks" to defeat the encryption. Remember Enigma?Of course, an encryption only has to be "good enough". In the army I have used such simple ecnryption systems for tactical messages that it would take anyone with more than 2 braincells only a few hours to decrypt it (or small Perl script, a few minutes). But when it was used for short messages only that would grow stale quickly, it was "good enough". By the time the enemy breaks the message, the information it was hiding would be next to useless to them anyhow.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
Re^5: Calculating corruption
by Jim (Curate) on Oct 19, 2014 at 00:49 UTC
|
So with this clearer statement of your actual problem, we can see there's a statistical method you can use to determine if a collection of bytes is less random than expected. And I may be better able to help you nail down the simplest method than a smart person is precisely because I don't know mathematics or statistics very well.
In an encrypted file, each of the 256 bytes from 0 through 255 will occur about the same number of times. They won't occur the exact same number of times, of course, but they'll mostly be very close in frequency. (This is one of your stated assumptions.) You can easily measure the maximum variance from the mean of the frequencies of one or more example encrypted files. I remember learning the word "epsilon" a few years ago. I think it applies here. You compute a useful epsilon to use to determine if one or more bytes of an encrypted file occur more or less frequently than expected. Wild outliers imply corruption.
I used the word "variance" above. I think standard deviation is a measure of statistical variance. (I'm not going to google it now. I'm winging this explanation on intuition and poor memory.) I think of the epsilon I described above as being the result of computing the greatest percentage difference from the mean of the furthest outlier from the mean in a viable encrypted file. I don't know enough about standard deviation to know if it has anything to do with my naïve conception of "percentage difference from the mean." But I suspect it does.
| [reply] |
|
|
yes, you hit the nail on the head i do believe checking the std dev for 0x00 - 0xFF byte characters. and this along with fore-mentioned, calculating entropy, checking percentage of how many times each byte shows up in a file ect, will help to determine (within a reasonable consideration) if the file is corrupt or not. tho this is not a 100% accurate way of telling though, but i think it is a good way to help, and is exactly what i am after. i am going to read up on standard deviation and try to script up something that will compute it per each of my files. i hope i get expected results
also, thank everyone for their time and input :)
ps also thanks for helping me figure out what my question should have been too. i really need to start taking the extra few mins to think about my question before i post. apologies
| [reply] |
|
|
ps also thanks for helping me figure out what my question should have been too. i really need to start taking the extra few mins to think about my question before i post. apologies
You're welcome, and no need to apologize. But, honestly, it would help a lot of if you'd fix the broken Shift keys on your computer. ;-)
| [reply] |
Re^5: Calculating corruption
by GotToBTru (Prior) on Oct 19, 2014 at 16:01 UTC
|
If "the keys are unknown and more than likely will never be known" the files cannot be decrypted, so who cares if they are corrupted or not?
| [reply] |
|
|
while downgrading ps3's, there are some per console data's. this per console data is what i am talking about. there is no way to get the keys for the few files i am talking about without destroying hardware. when you dump the flash contents, these very sensitive data's are dumped along with it, and like said many times in this thread already, if one bit (in the billions of bits throughout the dump) is off at all, it will brick the system. thats why it is useful to have many different methods to be able to check this per console data, because there is no way to decrypt it without destroying hardware to get the keys (afaik). and if you flash back bad data. you have a nice paper weight on your hands that cannot be salvaged EVER.
and i need to say again i guess, there is no way to tell 100%, this has been established many times already, but the methods used have an expected outcome that has been tried and true on thousands and thousands of these console dumps. if the data falls outside of a certain range from any statistical analysis, then you can place all bets on the dump is bad.
| [reply] |
|
|
Ah, so the application is jailbreaking PS3s. Nice. Next time you want to involve me in piracy, ask first, okay?
| [reply] |
|
|