now I found some time.
Here is the testing script, incorporating my random_data() into your original test script LanX posted and tybalt89 posted:
Here are the statistical distribution of the 3-line data provided initially:
Frequencies:
$VAR1 = {
'B' => 44,
'A' => 93,
'C' => 49
};
$VAR1 = {
'C' => {
'C' => 25,
'B' => 5,
'A' => 19
},
'A' => {
'C' => 11,
'A' => 66,
'B' => 16
},
'B' => {
'C' => 16,
'A' => 6,
'B' => 22
}
};
Probability Distribution:
$VAR1 = {
'B' => '0.236559139784946',
'A' => '0.5',
'C' => '0.263440860215054'
};
$VAR1 = {
'C' => {
'C' => '0.510204081632653',
'B' => '0.102040816326531',
'A' => '0.387755102040816'
},
'A' => {
'C' => '0.118279569892473',
'A' => '0.709677419354839',
'B' => '0.172043010752688'
},
'B' => {
'C' => '0.363636363636364',
'A' => '0.136363636363636',
'B' => '0.5'
}
};
Cumulative Probability distribution:
$VAR1 = {
'B' => '0.736559139784946',
'A' => '0.5',
'C' => '1'
};
$VAR1 = {
'C' => {
'C' => '1',
'B' => '0.489795918367347',
'A' => '0.387755102040816'
},
'A' => {
'C' => '1',
'A' => '0.709677419354839',
'B' => '0.881720430107527'
},
'B' => {
'C' => '1',
'A' => '0.136363636363636',
'B' => '0.636363636363636'
}
};
And here are the compression comparisons:
------------------------------ Compression by gzip/gunzip
length of data 210168
length of compressed data 45076
compressed to 21.4%
MATCH
------------------------------ Compression by 2 bit code, 6 bit runlen
+gth
length of data 210168
length of compressed data 83690
compressed to 39.8%
MATCH
------------------------------ Compression by 2 bits per letter
length of data 210168
length of compressed data 52542
compressed to 25.0%
MATCH
------------------------------ Compression by groups of 5,2,1
length of data 210168
length of compressed data 42035
compressed to 20.0%
MATCH
bw, bliako |