Update:Removed errent print from code as pointed out by cciulla below.
Also, this was intended as slightly tongue in cheek answer in response to the (golf) nomenclature in the (original) title. I apologise for omiting the smiley. See my addendom below by way of recompemse.
</update>
How about 10:1 compression?
use Devel::Size qw[total_size];
my @a = ( 0,54,28,76,126,0,28,54,62,54,0,28,54,60,48,0,54,54,62,54,0 )
+;
my $packed = pack 'C*', @a;
print 'Memory requirement of ', total_size(\@a)
, ' bytes is reduced to ', total_size(\$packed), ' bytes';
Memory requirement of 476 bytes is reduced to 46 bytes.
Addendum
There are two basic was of compressing character data.
Bitwise reduction (Theres probably a better term). In this you reduce the storage requirement by using less than 8-bit per character. For example. If you only needed to represent uppercase alpha then you could get away with 5-bits/char
so you could get a 3/8 ths reduction by packing them into a bitstream
But as you have 21 bytes and a range of values 0-128, you would at best be saving 1-bit per byte. 21-bits saves 2-bytes! Hardly worth the effort.
Then there is the dictionary method that you tried yourself. In this, you build a dictionary of the common bytes (actually strings of common byte sequences work better, but your sample data is too short and varied for this to work well), and then represent the bytes by indexes into the dictionary. The problem as you saw is that representing 1 byte by another, Plus the dictionary, makes it worse rather than better. However, if you then use the first technique to reduce the storage requirement of the indices, then you get somewhere.
Your dictionary has 8 entries [0,28,48,54,60,62,76,126 ] (which is unfortunate. If it where only 7, the compression would be greater). That means you need 4 bits/byte for the indices.
Update 2 I was having a bad day. 8 values can be indexed with 3 bits! So 21*3/8 gives 8 bytes not 11, so a total of 16 bytes is all that is required. In addition 1-bit per byte if the dictionary could be shed saving another byte.
my @ind = (0,3,1,6,7,0,1,3,5,3,0,1,3,4,2,0,3,3,5,3,0);
my $ind = '';
vec( $ind, $_, 4) = $ind[$_] for 0 .. $#ind;
print length $ind; #print 11
This has allowed you to pack the 21 indicies into 11 bytes. But now you need to concatenate that with the dictionary of 8 bytes, and you are back to 19 bytes!
If your data contains any common sequences of bytes then you can store multi-byte sequences in the dictionary and represent them with a single index and possibly get a greater saving. A cursory inspection show two such sequences, of two repetitions each
0, 54 and 54, 0 both appear twice. which mean that you could reduce the number of indices by 2, saving 1 byte. But the dictionary would have to grow by 4 bytes to do it. So that doesn't help much either.
If your range of chars was less, or the there were more data and a greater chance common sequences, then you might get better results. Unfortunately, your dataset is such that it could almost have been purposely chosen to be uncompressable:)
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
|