in reply to Re^6: a random_data() implementation
in thread How to efficently pack a string of 63 characters

did you also reproduce the chunks with same character?

or at least di- and trigrams?

zip does run-length-encoding.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^8: a random_data() implementation
by bliako (Abbot) on Sep 10, 2021 at 20:33 UTC

    If I understand you correctly: yes, the 3-line data contained things like AAAAAAA, that was picked up by my generator and produced lines like ACAAAABBBCCCAAAABBBBCCAAABCAAAAAAABBBBBBBBBBBBBCAAAACAAABCAABCA

    $data = join "", @data, @$random_data; # remove \n they can be re-ins +erted later # Add this to print all data print join("\n", @$random_data);
      Thanks!

      I looked into it and could reproduce your results.

      FWIW I tried best compression for gzip

      use IO::Compress::Gzip qw(gzip :constants); sub compgzip { gzip \(shift) => \(my $output), -Level => Z_BEST_COMPRESSION; $output; } use IO::Uncompress::Gunzip qw(gunzip); sub uncompgzip { gunzip \(shift) => \(my $output); $output; }

      and got

      ------------------------------ Compression by gzip/gunzip length of data 210168 length of compressed data 42210 compressed to 20.1% MATCH

      update

      I noticed that -Strategy => Z_RLE already led to compressed to 20.9% so my theory is that your runs are so homogeneously distributed that the second phase Huffmann couldn't squeeze more than 0.8% out of it.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      OK, strange ...

      .. I would have expected zip to perform better then.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery