RLE for simple array compression

hola;

from what i gather i think this is a case for good ol' Run Length Encoding (RLE):

here is RLE for an array in a couple of lines of not very strict code.

the basic idea is to look for repeated runs on a single symbol in your array, and replace by a single instance of the symbol and a run count. great if your set of symbols (numbers, in this case) is small.

here goes. result stored in @runlengths as refs.

@test = (23,23,4,8,21,90,90,90,90,2,2,2,19,21,19);
map {
    $length = ($test[$_ - 1] == $last)? $length + 1: 1;
    $run++ unless  $test[$_ - 1] == $last;
    $last = $test[$_ - 1];
    $runlengths[$run] = [$test[$_ - 1], $length];
} (1 .. scalar @test);
[download]

ok, so what does it look like?

using

@strings = map { 
$runlengths[$_][0] . "x" . $runlengths[$_][1] } ( 1 .. $#runlengths);
[download]

gives us the set of strings

("23x2","4x1","8x1","21x1" ... "19x1")

and i guess no compresssion routine is complete without an extraction function, which i have not optimized much here...

sub extract{
    my $index = shift;
    my ($last, $lastindex);
    foreach (@runlengths[1 .. $#runlengths]){
    ($last, $lastindex) = ($$_[0], $lastindex + $$_[1]);
    return $last if $index <= $lastindex - 1;
    }
    return undef;
}
[download]

a final note: no discussion of compression and perl is complete without reference to the Mark Jason-Dominus article on Huffman encoding, which would grace even a royal toilet:

http://perl.plover.com/Huffman/huffman.html

that is meant as sincere flattery btw.

hope that helps

...wufnik

-- in the world of the mules there are no rules --

Comment on RLE for simple array compression Select or Download Code