Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: Efficient 7bit compression

by abell (Chaplain)
on Mar 15, 2005 at 14:05 UTC ( [id://439630]=note: print w/replies, xml ) Need Help??


in reply to Re: Efficient 7bit compression
in thread Efficient 7bit compression

In general, all lossless compression algorithms will produce output longer than the input for some values of the input.

Here is a sketchy proof:
there is a finite number of strings of length at most n. If none of them were compressed to a longer string, it would mean that their compressed values would be again the set of all strings of length at most n (i.e. the compression would be a permutation of said set). This implies that any string would be "compressed" to a string of the same length.
In other words, the only 1-1 injective (i.e. lossless) mappings that never cause length to increase are those that keep length the same.

Update: I changed 1-1 with injective, which makes the statement a bit stronger and more directly related to the sketch of proof. As a side note, I should point out that compression algorithms should tend to be 1-1 anyway, so as to avoid "wasting" short strings.
As to MidLifeXis's comment, of course by "compression algorithm" I meant something which strictly compresses at least one input, as pointed out by tilly.

Cheers

Antonio


The stupider the astronaut, the easier it is to win the trip to Vega - A. Tucket

Replies are listed 'Best First'.
Re^3: Efficient 7bit compression
by MidLifeXis (Monsignor) on Mar 15, 2005 at 17:58 UTC

    One small nit...

    In general, all lossless compression algorithms will produce output equal in length or longer than the input for some values of the input.

    Use an identity function as your compressor.

    Otherwise, good back of the napkin / handwavey proof.

    --MidLifeXis

      Your nit is incorrect and the proof (at least the one that is there now) is completely correct. If a lossless compression algorithm compresses any string, then there must be some other string that is made longer by that algorithm. If a lossless transform does not make any strings longer, then it also cannot make any shorter and so isn't a compression algorithm.

      A more formal version of the proof is as follows. Suppose that a lossless compression algorithm manages to compress some string of length longer than n to length n or less. There are exactly 2**n + 2**(n-1) + ... + 2**0 = 2**(n+1) - 1 strings of length n or less. If the algorithm does not make any string of length n or less longer than n, then at least 2**(n+1) strings must map into this set, and by the pidgeonhole principle at least 2 strings are sent to the same result. However if we get that resulting then we can't figure out the original string (because we could have started with 2 different strings) contradicting our assumption that the algorithm was lossless.

      Therefore a lossless compression algorithm that manages to compress any string must make some other string longer.

      (Of course in practice one hopes that the strings which are made longer are likely to be rare and not made much longer, while the strings that are made shorter are likely to be common and made much shorter...)

        First, the proof (or my parsing of the proof :) changed since I first responded. If it did not change, then I offer my apologies to abell and blame it on my not consuming enough caffine.

        As I read it now, the proof appears to be correct, even if you use an identity function (or something that looks like an identity function for a given x) for f(x) (example, replace all /a+/ with "a" . length($1) . "a" - a win for all /a+/ longer than 3, identity for /^[^a]*$/).

        Like I said, a minor nit, which was plugged by abell or originally misread by /me.

        --MidLifeXis

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://439630]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-26 07:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found