Re: How to efficently pack a string of 63 characters

If the sample is real and representative, then it looks like you have lots of similarities between two lines, akin to image data.

I'd try:

calculate the deltas between two lines like C-A=2
do a run length encoding of those deltas , like (2,5),(0,20),...
probably try Huffman code the run length pairs.

You can skip step 3 and try direct zipping tho after step 2, because Huffman is part of zip IIRC.

NB: feeding binary data into zip might be contra productive. Better a "readable" text where the patterns are visible.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

Wikisyntax for the Monastery}

Comment on Re: How to efficently pack a string of 63 characters

Replies are listed 'Best First'.
Re^2: How to efficently pack a string of 63 characters by LanX (Saint) on Sep 11, 2021 at 14:11 UTC
> calculate the deltas between two lines like C-A=2 OK I tried this and fed the result into IO::Compress::Gzip , in all scenarios this only yeld at most one percent in gain. I'd say gzip is the best choice here, it compressed between 4% and 24% (worst case) without any prior knowledge. And 19.81% is the optimum for the worst case of total entropy. I also tried providing extra flags for best compression and minimal head, but the benefit was again only 1% at most. And this module is in core. `>corelist IO::Compress::Gzip Data for 2021-01-23 IO::Compress::Gzip was first released with perl v5.9.4` [download] I think the decision is a no brainer ... Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^3: How to efficently pack a string of 63 characters by tybalt89 (Monsignor) on Sep 11, 2021 at 15:09 UTC
In my testing I also tried IO::Compress::Bzip2 which was always slightly better than Gzip and is also in core. `corelist IO::Compress::Bzip2 Data for 2021-05-20 IO::Compress::Bzip2 was first released with perl v5.10.1` [download]	[reply] [d/l]
Re^4: How to efficently pack a string of 63 characters by LanX (Saint) on Sep 11, 2021 at 16:44 UTC
> I also tried IO::Compress::Bzip2 which was always slightly better oh Burrows–Wheeler transform is fun, this could indeed lead to squeezing out more redundancy ... even in pre-processing the lines. Anyway without real data it's a rather theoretical discussion. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]