in reply to What data is code?

Attempt to compress the text with a widely used compression, like Compress::Zlib.

If it compresses significantly, it is unencrypted.

You will need to do some playing around to figure out your threshold. But that test should work pretty well.

Replies are listed 'Best First'.
Re: Re (tilly) 1: What data is code?
by jepri (Parson) on Nov 22, 2001 at 06:28 UTC
    Even better, the LZW algorithms are known to fail catastrophically on data with few repeating patterns. Zlib can produce a 'compressed' file larger than the original file. This should occur in the cases that you describe in your post. As you say, programs tend to compress well, due to whitspace and repeated variable names. So you only have to check if the return from zlib is larger than the original file.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

Re: Re (tilly) 1: What data is code?
by Beatnik (Parson) on Nov 22, 2001 at 04:01 UTC
    I'm keeping the Compress stuff for another module set :)))

    Anyway, uhm the cipher and keyphrase are totally user dependent. The key feature is the encryption, not really the compression (altho filter stacking should be doable). Paul Marquess provides a simple compression filter in Filter tho.
    The compression would have the same problem, how do you know if it's compressed or not??

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
      Um, there is no problem.

      If you attempt to compress already compressed and/or encrypted data, you should not get significant further compression. If you do then your original encryption or compression was of pretty poor quality. But if you take realistic text (eg program code) and use popular compression algorithms, you will reliably get significant compression. (Which is why people use them in the first place.)

      Therefore by taking some text and attempting to compress it, you can tell normal data from compressed or encrypted data. For instance you might say that if you can reduce its length by 20% or more, it is normal text. Aside from a few short text sequences, you are unlikely to go wrong with a test like this.

      There is, however, no way upon casual inspection to distinguish compressed data from encrypted data from white noise. The reasons for this involve information theory.

        The problem I spotted : If you detect the data not being encrypted, you encrypt it, write it to file and exit the code. Hence the actual code you want to encrypt is never executed. Also suppose somehow you can detect the code being encrypted (after uhm lets say pass 3), then you'd have to decrypt it X times (eg 3 passes).

        Greetz
        Beatnik
        ... Quidquid perl dictum sit, altum viditur.