in reply to How to efficently pack a string of 63 characters

If the sample is real and representative, then it looks like you have lots of similarities between two lines, akin to image data.

I'd try:

You can skip step 3 and try direct zipping tho after step 2, because Huffman is part of zip IIRC.

NB: feeding binary data into zip might be contra productive. Better a "readable" text where the patterns are visible.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

  • Comment on Re: How to efficently pack a string of 63 characters

Replies are listed 'Best First'.
Re^2: How to efficently pack a string of 63 characters
by LanX (Saint) on Sep 11, 2021 at 14:11 UTC
    > calculate the deltas between two lines like C-A=2

    OK I tried this and fed the result into IO::Compress::Gzip , in all scenarios this only yeld at most one percent in gain.

    I'd say gzip is the best choice here, it compressed between 4% and 24% (worst case) without any prior knowledge. And 19.81% is the optimum for the worst case of total entropy.

    I also tried providing extra flags for best compression and minimal head, but the benefit was again only 1% at most.

    And this module is in core.

    >corelist IO::Compress::Gzip Data for 2021-01-23 IO::Compress::Gzip was first released with perl v5.9.4

    I think the decision is a no brainer ...

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      In my testing I also tried IO::Compress::Bzip2 which was always slightly better than Gzip and is also in core.

      corelist IO::Compress::Bzip2 Data for 2021-05-20 IO::Compress::Bzip2 was first released with perl v5.10.1