in reply to Re^4: a random_data() implementation
in thread How to efficently pack a string of 63 characters

Well thanks, you are free to test it against tybalt's code :)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^6: a random_data() implementation
by bliako (Abbot) on Sep 10, 2021 at 17:32 UTC

    now I found some time.

    Here is the testing script, incorporating my random_data() into your original test script LanX posted and tybalt89 posted:

    Here are the statistical distribution of the 3-line data provided initially:

    Frequencies: $VAR1 = { 'B' => 44, 'A' => 93, 'C' => 49 }; $VAR1 = { 'C' => { 'C' => 25, 'B' => 5, 'A' => 19 }, 'A' => { 'C' => 11, 'A' => 66, 'B' => 16 }, 'B' => { 'C' => 16, 'A' => 6, 'B' => 22 } }; Probability Distribution: $VAR1 = { 'B' => '0.236559139784946', 'A' => '0.5', 'C' => '0.263440860215054' }; $VAR1 = { 'C' => { 'C' => '0.510204081632653', 'B' => '0.102040816326531', 'A' => '0.387755102040816' }, 'A' => { 'C' => '0.118279569892473', 'A' => '0.709677419354839', 'B' => '0.172043010752688' }, 'B' => { 'C' => '0.363636363636364', 'A' => '0.136363636363636', 'B' => '0.5' } }; Cumulative Probability distribution: $VAR1 = { 'B' => '0.736559139784946', 'A' => '0.5', 'C' => '1' }; $VAR1 = { 'C' => { 'C' => '1', 'B' => '0.489795918367347', 'A' => '0.387755102040816' }, 'A' => { 'C' => '1', 'A' => '0.709677419354839', 'B' => '0.881720430107527' }, 'B' => { 'C' => '1', 'A' => '0.136363636363636', 'B' => '0.636363636363636' } };

    And here are the compression comparisons:

    ------------------------------ Compression by gzip/gunzip length of data 210168 length of compressed data 45076 compressed to 21.4% MATCH ------------------------------ Compression by 2 bit code, 6 bit runlen +gth length of data 210168 length of compressed data 83690 compressed to 39.8% MATCH ------------------------------ Compression by 2 bits per letter length of data 210168 length of compressed data 52542 compressed to 25.0% MATCH ------------------------------ Compression by groups of 5,2,1 length of data 210168 length of compressed data 42035 compressed to 20.0% MATCH

    bw, bliako

      did you also reproduce the chunks with same character?

      or at least di- and trigrams?

      zip does run-length-encoding.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        If I understand you correctly: yes, the 3-line data contained things like AAAAAAA, that was picked up by my generator and produced lines like ACAAAABBBCCCAAAABBBBCCAAABCAAAAAAABBBBBBBBBBBBBCAAAACAAABCAABCA

        $data = join "", @data, @$random_data; # remove \n they can be re-ins +erted later # Add this to print all data print join("\n", @$random_data);