Re (tilly) 3: What data is code?

Um, there is no problem.

If you attempt to compress already compressed and/or encrypted data, you should not get significant further compression. If you do then your original encryption or compression was of pretty poor quality. But if you take realistic text (eg program code) and use popular compression algorithms, you will reliably get significant compression. (Which is why people use them in the first place.)

Therefore by taking some text and attempting to compress it, you can tell normal data from compressed or encrypted data. For instance you might say that if you can reduce its length by 20% or more, it is normal text. Aside from a few short text sequences, you are unlikely to go wrong with a test like this.

There is, however, no way upon casual inspection to distinguish compressed data from encrypted data from white noise. The reasons for this involve information theory.

Comment on Re (tilly) 3: What data is code?

Replies are listed 'Best First'.
Re: Re (tilly) 3: What data is code? by Beatnik (Parson) on Nov 22, 2001 at 04:22 UTC
The problem I spotted : If you detect the data not being encrypted, you encrypt it, write it to file and exit the code. Hence the actual code you want to encrypt is never executed. Also suppose somehow you can detect the code being encrypted (after uhm lets say pass 3), then you'd have to decrypt it X times (eg 3 passes). Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply]
Re (tilly) 5: What data is code? by tilly (Archbishop) on Nov 22, 2001 at 05:11 UTC
What are you talking about? I am comparing your response to what I wrote, and I am left wondering if you even tried to understand what I said. Certainly you didn't "spot" what you think you did in what I said. Allow me to demonstrate with sample code: `# Assume you have use Compress::Zlib; use Carp; # Here is the function. It takes text, and returns 3 # possible answers. 1 if the text looks unencrypted. # 0 if it looks encrypted. And undef if it is not able # to tell reliably. sub text_is_normal { my $text = shift; length($text) < 50 ? undef : (length(compress($text)) < 0.8*length($text)); }` [download] There you are. Pass this function some text. It returns one of three answers. The text is normal, uncompressed and unencrypted data. The text is compressed. The text is too short to tell reliably. Good luck finding any reasonable code which fools it. Do whatever you like with the answers and that code. Including reducing 50 (which is a rather pessimistic bound). There are no questions about taking 3 passes at it. There is nothing about writing this data anywhere. There is just a function that answers the question you asked at the start of this thread.	[reply] [d/l]
Re: Re (tilly) 5: What data is code? by Beatnik (Parson) on Nov 22, 2001 at 15:04 UTC
I apparently missed the point... but again, the intention is to encrypt the code, not to compress it. The length is AFAIK not a hint on if a stream is encrypted or not. I can't control how encryption routines work (I'm using Crypt::CBC), I can't predict how large their output will be. Hard coding maximum sizes is definitly not handy. Update: Since I can't control the encryption process itself, there is no way I can guarantee a encrypted compressed block is larger than a non encrypted compressed block. Adding the compression overhead might turn out not to be a good idea. Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply]
Re (tilly) 7: What data is code? by tilly (Archbishop) on Nov 22, 2001 at 22:46 UTC
Re: Re (tilly) 7: What data is code? by clemburg (Curate) on Nov 23, 2001 at 15:34 UTC
Some notes below your chosen depth have not been shown here