in reply to pack unpack charcount repetition

hola;

from what i gather i think this is a case for good ol' Run Length Encoding (RLE):

here is RLE for an array in a couple of lines of not very strict code.

the basic idea is to look for repeated runs on a single symbol in your array, and replace by a single instance of the symbol and a run count. great if your set of symbols (numbers, in this case) is small.

here goes. result stored in @runlengths as refs.

@test = (23,23,4,8,21,90,90,90,90,2,2,2,19,21,19); map { $length = ($test[$_ - 1] == $last)? $length + 1: 1; $run++ unless $test[$_ - 1] == $last; $last = $test[$_ - 1]; $runlengths[$run] = [$test[$_ - 1], $length]; } (1 .. scalar @test);
ok, so what does it look like?

using
@strings = map { $runlengths[$_][0] . "x" . $runlengths[$_][1] } ( 1 .. $#runlengths);
gives us the set of strings

("23x2","4x1","8x1","21x1" ... "19x1")

and i guess no compresssion routine is complete without an extraction function, which i have not optimized much here...
sub extract{ my $index = shift; my ($last, $lastindex); foreach (@runlengths[1 .. $#runlengths]){ ($last, $lastindex) = ($$_[0], $lastindex + $$_[1]); return $last if $index <= $lastindex - 1; } return undef; }
a final note: no discussion of compression and perl is complete without reference to the Mark Jason-Dominus article on Huffman encoding, which would grace even a royal toilet:

http://perl.plover.com/Huffman/huffman.html

that is meant as sincere flattery btw.

hope that helps

...wufnik

-- in the world of the mules there are no rules --