Update:Removed errent print from code as pointed out by cciulla below.

Also, this was intended as slightly tongue in cheek answer in response to the (golf) nomenclature in the (original) title. I apologise for omiting the smiley. See my addendom below by way of recompemse.

</update>

How about 10:1 compression?

use Devel::Size qw[total_size]; my @a = ( 0,54,28,76,126,0,28,54,62,54,0,28,54,60,48,0,54,54,62,54,0 ) +; my $packed = pack 'C*', @a; print 'Memory requirement of ', total_size(\@a) , ' bytes is reduced to ', total_size(\$packed), ' bytes'; Memory requirement of 476 bytes is reduced to 46 bytes.

Addendum

There are two basic was of compressing character data.

Bitwise reduction (Theres probably a better term). In this you reduce the storage requirement by using less than 8-bit per character. For example. If you only needed to represent uppercase alpha then you could get away with 5-bits/char so you could get a 3/8 ths reduction by packing them into a bitstream

But as you have 21 bytes and a range of values 0-128, you would at best be saving 1-bit per byte. 21-bits saves 2-bytes! Hardly worth the effort.

Then there is the dictionary method that you tried yourself. In this, you build a dictionary of the common bytes (actually strings of common byte sequences work better, but your sample data is too short and varied for this to work well), and then represent the bytes by indexes into the dictionary. The problem as you saw is that representing 1 byte by another, Plus the dictionary, makes it worse rather than better. However, if you then use the first technique to reduce the storage requirement of the indices, then you get somewhere.

Your dictionary has 8 entries [0,28,48,54,60,62,76,126 ] (which is unfortunate. If it where only 7, the compression would be greater). That means you need 4 bits/byte for the indices.

Update 2 I was having a bad day. 8 values can be indexed with 3 bits! So 21*3/8 gives 8 bytes not 11, so a total of 16 bytes is all that is required. In addition 1-bit per byte if the dictionary could be shed saving another byte.

my @ind = (0,3,1,6,7,0,1,3,5,3,0,1,3,4,2,0,3,3,5,3,0); my $ind = ''; vec( $ind, $_, 4) = $ind[$_] for 0 .. $#ind; print length $ind; #print 11

This has allowed you to pack the 21 indicies into 11 bytes. But now you need to concatenate that with the dictionary of 8 bytes, and you are back to 19 bytes!

If your data contains any common sequences of bytes then you can store multi-byte sequences in the dictionary and represent them with a single index and possibly get a greater saving. A cursory inspection show two such sequences, of two repetitions each 0, 54 and 54, 0 both appear twice. which mean that you could reduce the number of indices by 2, saving 1 byte. But the dictionary would have to grow by 4 bytes to do it. So that doesn't help much either.

If your range of chars was less, or the there were more data and a greater chance common sequences, then you might get better results. Unfortunately, your dataset is such that it could almost have been purposely chosen to be uncompressable:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller



In reply to Re: pack unpack charcount repetition by BrowserUk
in thread pack unpack charcount repetition by denthijs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.