hola;
from what i gather i think this is a case for good ol' Run Length Encoding (RLE):
here is RLE for an array in a couple of lines of not very strict code.
the basic idea is to look for repeated runs on a single symbol in your array, and replace by a single instance of the symbol and a run count. great if your set of symbols (numbers, in this case) is small.
here goes. result stored in @runlengths as refs.
@test = (23,23,4,8,21,90,90,90,90,2,2,2,19,21,19);
map {
$length = ($test[$_ - 1] == $last)? $length + 1: 1;
$run++ unless $test[$_ - 1] == $last;
$last = $test[$_ - 1];
$runlengths[$run] = [$test[$_ - 1], $length];
} (1 .. scalar @test);
ok, so what does it look like?
using
@strings = map {
$runlengths[$_][0] . "x" . $runlengths[$_][1] } ( 1 .. $#runlengths);
gives us
the set of strings
("23x2","4x1","8x1","21x1" ... "19x1")
and i guess no compresssion routine is complete without an extraction function, which i have not optimized much here...
sub extract{
my $index = shift;
my ($last, $lastindex);
foreach (@runlengths[1 .. $#runlengths]){
($last, $lastindex) = ($$_[0], $lastindex + $$_[1]);
return $last if $index <= $lastindex - 1;
}
return undef;
}
a final note:
no discussion of compression and perl is complete without reference to the Mark Jason-Dominus article on Huffman encoding, which would grace even a royal toilet:
http://perl.plover.com/Huffman/huffman.html
that is meant as sincere flattery btw.
hope that helps
...wufnik
-- in the world of the mules there are no rules --
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.