in reply to Re^2: How to efficently pack a string of 63 characters
in thread How to efficently pack a string of 63 characters
to prove my point, here an altered version of your code which fakes longer data by rotating the original input and showing zip at 5% compression. That's factor 4 better than your champion.
(Of course is rotating kind of biased, because it keeps most run length chunks intact and zip will efficiently Huffmann all Runs it finds.° But it's up to the OP to provide unbiased data, I'm no psychic... ;-)
FWIW: zip is already at 18% after only tripling the input.
use strict; # https://perlmonks.org/?node_id=11136582 use warnings; # originally https://perlmonks.org/?displ +aytype=displaycode;node_id=11136614;abspart=1;part=1 use feature 'say'; for my $try ( [ 'gzip/gunzip', \&compgzip, \&uncompgzip ], [ '2 bit code, 6 bit runlength', \&comp62, \&uncomp62 ], [ '2 bits per letter', \&comp2bits, \&uncomp2bits ], [ 'groups of 5,2,1', \&comp5, \&uncomp5 ], ) { my ($method, $comp, $uncomp) = @$try; print "\n------------------------------ Compression by $method\n\n +"; my $data = <<END; ABBCBCAAAAABBCBCACCCAAAAACAAAAABBBBBAAAAABBAAAAAAAABBCCCACCAABC BCCCBCAACAABBBCAAACCAAAAACAAAAABBBBBAAAAABBAAAAAAAABBCCCACCABBC ABCCBBBAAAABBABCACABCCCCCCAAAAABBCBBCCCCAAAAAAAAAAAAACCCACCACCC END my @data = split/\n/, $data; push @data, map{rotate($data[$_])} 0..2 for 1..100; # fake x1 +00 times data $data = join "", @data; # remove \n they can be re-inserted later print "length of data @{[ $data =~ tr/ABC// ]}\n"; my $compressed = $comp->($data); print "length of compressed data @{[ length $compressed ]}\n"; #use Data::Dump 'dd'; dd $compressed; # print unpack('H*', $compressed), "\n"; my $uncompressed = $uncomp->($compressed); printf "compressed to %.1f%%\n", 100 * length($compressed) / length $uncompressed; print $data eq $uncompressed ? "MATCH" : "************ no MATCH", +"\n"; } # by lanx sub rotate { # fake data from original samp +le # say "O: ", my $orig =shift; # say my $rnd = int rand 63; my $head = substr $orig,0,$rnd,""; # say "N: ", my $new = $orig.$head; return $new; } # compress by groups of 5,2,1 to single letter sub comp5 { my @code = map glob('{A,B,C}' x $_), 5, 2, 1; my %code; @code{@code} = map chr, 1 .. @code; local $" = '|'; shift =~ s/(@code)/$code{$1}/gr } sub uncomp5 { my @code = map glob('{A,B,C}' x $_), 5, 2, 1; my %code; @code{map chr, 1 .. @code} = @code; join '', @code{split //, shift}; } # compress by lower two bits of letter sub comp2bits { my ($ans, $n) = ('', 0); vec($ans, $n++, 2) = 3 & ord $_ for split //, shift; $ans; } sub uncomp2bits { my $comp = shift; join '', map { ('', 'A', 'B', 'C')[ vec $comp, $_, 2] } 0 .. -1 + 4 * length $comp; } # compress by runlength or 6 bits length and 2 bits letter code sub comp62 { shift =~ s/([ABC])\1{0,62}/ chr( length($&) << 2 | ord($1) & 3) +/ger; } sub uncomp62 { shift =~ s/./ (($& & "\3") | '@') x (ord($&) >> 2) /gesr; } # compress by gzip use IO::Compress::Gzip qw(gzip); sub compgzip { gzip \(shift) => \(my $output); $output; } use IO::Uncompress::Gunzip qw(gunzip); sub uncompgzip { gunzip \(shift) => \(my $output); $output; }
-*- mode: compilation; default-directory: "d:/tmp/pm/" -*- Compilation started at Fri Sep 10 13:25:18 C:/Strawberry/perl/bin\perl.exe -w d:/tmp/pm/pack_63_chars.pl ------------------------------ Compression by gzip/gunzip length of data 19089 length of compressed data 1021 compressed to 5.3% MATCH ------------------------------ Compression by 2 bit code, 6 bit runlen +gth length of data 19089 length of compressed data 7714 compressed to 40.4% MATCH ------------------------------ Compression by 2 bits per letter length of data 19089 length of compressed data 4773 compressed to 25.0% MATCH ------------------------------ Compression by groups of 5,2,1 length of data 19089 length of compressed data 3819 compressed to 20.0% MATCH Compilation finished at Fri Sep 10 13:25:18
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
°) even after constantly reversing one part of the input I'm at 9% compression for factor 100 input.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: a random_data() implementation
by bliako (Abbot) on Sep 10, 2021 at 14:11 UTC | |
by LanX (Saint) on Sep 10, 2021 at 15:41 UTC | |
by bliako (Abbot) on Sep 10, 2021 at 17:32 UTC | |
by LanX (Saint) on Sep 10, 2021 at 19:23 UTC | |
by bliako (Abbot) on Sep 10, 2021 at 20:33 UTC | |
| |
|
Re^4: How to efficently pack a string of 63 characters (zip worst case)
by LanX (Saint) on Sep 10, 2021 at 13:09 UTC | |
by tybalt89 (Monsignor) on Sep 10, 2021 at 15:26 UTC |