in reply to Re: How to efficently pack a string of 63 characters
in thread How to efficently pack a string of 63 characters
I'm pretty sure that zip has some fix overhead which doesn't pay off with just 189 bytes input.
> my @code = map glob('{A,B,C}'x $_), 5, 2, 1;
I haven't run your code but it looks like you are mapping 9 possible chunks to a character needing a byte.
Looks like you are wasting space. Already a naive 4 bit per chunk approach, i.e 1 byte for two chunks would double your efficiency.° (Needless to say, Huffman even more) (Update: sorry I totally misunderstood how your glob works)
FWIW I doubt the input sample is realistic. Looks handmade by copying and altering the first line, hence the high redundancy. :)
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: How to efficently pack a string of 63 characters
by tybalt89 (Monsignor) on Sep 09, 2021 at 23:39 UTC | |
I think you might be misreading the glob.
| [reply] |
by LanX (Saint) on Sep 10, 2021 at 07:45 UTC | |
I agree that this 5 to 1 is almost optimal, if the alphabet is really random, i.e without redundancy. 3**5=243 that means you are using 7.92 bits of the byte. Plus some more for smaller trailing chunks. That's very efficient. The theoretical optimum is at 37.5 bytes and you only need 39. But I think zip should do considerably better than 40% if this particular raw input was longer. (update: like proven here)
Cheers Rolf | [reply] [d/l] |
|
Re^3: How to efficently pack a string of 63 characters (longer input)
by LanX (Saint) on Sep 10, 2021 at 11:32 UTC | |
to prove my point, here an altered version of your code which fakes longer data by rotating the original input and showing zip at 5% compression. That's factor 4 better than your champion. (Of course is rotating kind of biased, because it keeps most run length chunks intact and zip will efficiently Huffmann all Runs it finds.° But it's up to the OP to provide unbiased data, I'm no psychic... ;-) FWIW: zip is already at 18% after only tripling the input.
Cheers Rolf °) even after constantly reversing one part of the input I'm at 9% compression for factor 100 input. | [reply] [d/l] [select] |
by bliako (Abbot) on Sep 10, 2021 at 14:11 UTC | |
here is a random_data() which simulates the statistical properties of the 3 lines of data baxy77bax provided: 5' EDIT on the script below, nothing important
bw, bliako | [reply] [d/l] |
by LanX (Saint) on Sep 10, 2021 at 15:41 UTC | |
Cheers Rolf | [reply] |
by bliako (Abbot) on Sep 10, 2021 at 17:32 UTC | |
by LanX (Saint) on Sep 10, 2021 at 19:23 UTC | |
| |
by LanX (Saint) on Sep 10, 2021 at 13:09 UTC | |
tybalt89's 5,2,1 wins with constant 20% at optimum, but zip is second best with spectacular 23-24%. That's the worst case for zip, if it can't find symmetric patterns or uneven distribution. But it's still compressing within close range to the optimum.
Cheers Rolf | [reply] [d/l] [select] |
by tybalt89 (Monsignor) on Sep 10, 2021 at 15:26 UTC | |
And tybalt89 does the "I beat gzip" happy dance :)
| [reply] |