Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re^4: Compressing and Encrypting files on Windows

by jon_barber (Initiate)
on Nov 03, 2004 at 04:56 UTC ( [id://404818] : note . print w/replies, xml ) Need Help??

in reply to Re^3: Compressing and Encrypting files on Windows
in thread Compressing and Encrypting files on Windows

What on earth makes you think data encrypted with a one time pad contains redundant data repeats and can thus be compressed? Please see Elgons note at Re^2: Compressing and Encrypting files on Windows and follow the link.

What makes you think that a OTP pad doesn't contain redundant repeats? As I understand it, a OTP operates by applying randomly generated data (the OTP) to the data that you wish to encrypt. To decrypt it you then reverse the transformation with the same OTP.

Given that the OTP is randomly generated, is it not possible to imagine that it might produce a cyphertext which did contain redundant data repeats and could therefore be highly compressable?

As I stated in my original comment, it was a nitpick over the definition that "a decent encryption algorithm would produce ... noise."

Replies are listed 'Best First'.
Re^5: Compressing and Encrypting files on Windows
by tachyon (Chancellor) on Nov 03, 2004 at 06:12 UTC

    Given that the OTP is randomly generated, is it not possible to imagine that it might produce a cyphertext which did contain redundant data repeats and could therefore be highly compressable?

    No, no and no. You are just plain wrong. Random data by definition contains no repeats. If it does it is not random. OTPs depend on that randomness. As you seem resistant to my suggestion to do some research.....

    What is a One-Time Pad?

    A one-time pad is a very simple yet completely unbreakable symmetric cipher. "Symmetric" means it uses the same key for encryption as for decryption. As with all symmetric ciphers, the sender must transmit the key to the recipient via some secure and tamperproof channel, otherwise the recipient won't be able to decrypt the ciphertext.

    The key for a one-time pad cipher is a string of random bits, usually generated by a cryptographically strong pseudo-random number generator (CSPRNG). It is better to generate the key using the natural randomness of quantum mechanical events (such as those detected by a Geiger counter), since quantum events are believed by many to be the only source of truly random information in the universe. One-time pads that use CSPRNGs are open to attacks which attempt to compute part or all of the key.

    With a one-time pad, there are as many bits in the key as in the plaintext. This is the primary drawback of a one-time pad, but it is also the source of its perfect security (see below). It is essential that no portion of the key ever be reused for another encryption (hence the name "one-time pad"), otherwise cryptanalysis can break the cipher.

    The cipher itself is exceedlingly simple. To encrypt plaintext, P, with a key, K, producing ciphertext, C, simply compute the bitwise exclusive-or of the key and the plaintext:

    C = K^P
    To decrypt ciphertext, C, the recipient computes
    P = K^C
    It's that simple, and it's perfectly secure, as long as the key is random and is not compromised.

    Why Are One-Time Pads Perfectly Secure?

    If the key is truly random, an xor-based one-time pad is perfectly secure against ciphertext-only cryptanalysis. This means an attacker can't compute the plaintext from the ciphertext without knowlege of the key, even via a brute force search of the space of all keys! Trying all possible keys doesn't help you at all, because all possible plaintexts are equally likely decryptions of the ciphertext.

    This result is true regardless of how few bits the key has or how much you know about the structure of the plaintext. To see this, suppose you intercept a very small, 8-bit, ciphertext. You know it is either the ASCII character 'S' or the ASCII character 'A' encrypted with a one-time pad. You also know that if it's 'S', the enemy will attack by sea, and if it's 'A', the enemy will attack by air. That's a lot to know. All you are missing is the key, a silly little 8-bit one-time pad.

    You assign your crack staff of cryptanalysts to try all 256 8-bit one-time pads. This is a brute force search of the keyspace.

    The results of the brute force search of the keyspace is that your staff finds one 8-bit key that decrypts the ciphertext to 'S' and one that decrypts it to 'A'. And you still don't know which one is the actual plaintext.

    This argument is easilly generalized to keys (and plaintexts) of arbitrary length.

    Text taken from here

      sigh. I read the link you originally posted, and with regard to this this discussion, it is irrelevant.

      My point is that when encrypting data with a OTP, there is absolutely no requirement that the cypher text be uncompressible.

      You seem be confusing the concepts of random and compression. And that if something is randomly generated, it can't be compressed.

      To show that this isn't the case, lets do a small thought experiment. We'll use an unbiased coin as a generator of randomness and flip the coin a number of times to generate a series of events. If you flip the coin enough times you would expect the results overall to be statistically indistinguishable, but if you examine sections of the sequence then you might find sections which are repeats, and therefore can be compressed. For example, it's entirely possible to flip 100 heads in a row, which is possible to compress with run length encoding. In fact, the probability of this sequence is exactly the same as all other orderings of 100 flips.

      As a concrete example, lets assume we want to send a message of 100 bits, we need a key length of 100 bits (equivalent to 100 coin flips) to apply to the data, as you pointed out. We'll assume that the message has all the bits set to 0, which could correspond to a particular flag. So lets look at the key. Given that over a small number of events, the key might be compressible, and given that the message is xor'd against the key, the result might be compressible.

      Now you might say that a geiger counter would never produce this kind of data. But imagine that a sun in a distant systems goes supernova, or there's a large solar flame in our Sun, either of which creates a large number of exotic particles, which during the acquisition of the key maxes out our geiger counter. The key will be all the same bits. This would result in a cypher text with the same statistical properties as the plain text, which if it's english text, is compressible.

      Indeed, the cypher text might appear to be english text, irrespective of what the key or the plain text is, and therefore compressible.

      What are the chances of this, flipping small.