in reply to Re^2: Efficient 7bit compression
in thread Efficient 7bit compression

This has constant overhead by expanding only 8 chars at a time.
sub shrink { my $packed = ''; while ($_[0] =~ /(.{1,8})/gs) { my $bitstr = unpack('B*', $1); $bitstr =~ s/.(.{7})/$1/g; $packed .= pack('B*', $bitstr); } $packed; } sub grow { my $unpacked = ''; while ($_[0] =~ /(.{1,7})/gs) { my $bitstr = unpack('B*', $1); $bitstr =~ s/(.{7})/0$1/g; $unpacked .= pack('B*', $bitstr); } $unpacked; }

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^4: Efficient 7bit compression
by Limbic~Region (Chancellor) on Mar 14, 2005 at 16:14 UTC
    Roy Johnson,
    I tried a similar approach only using vec() and starting 1 bit back from the end of the string each time - I got bizarre results. An idea I had that I would have pursued if I could have made it work would be to pre-size the strings and use 4arg substr instead of growing it each time.

    Cheers - L~R

      Are you talking about shrinking in place, so you don't have the overhead of having the compressed and uncompressed strings in memory at the same time?
      my $str =<<'EOS'; There once was a man from Nantucket, Who kept all his cash in a bucket. His daughter, named Nan, Ran off with a man. And as for the bucket, Nantucket. EOS sub shrink_in_place { for (my $i=0; $i<length($_[0]); $i+=7) { for (substr($_[0], $i, 8)) { $_ = pack('B*', grep s/.(.{7})/$1/g, unpack('B*', $_)); } } } sub grow_in_place { for (my $i=0; $i<length($_[0]); $i+=8) { for (substr($_[0], $i, 7)) { $_ = pack('B*', grep s/(.{7})/0$1/g, unpack('B*', $_)); } } } print "$str"; printf "Length is %d\n", length($str); shrink_in_place($str); printf "Shrunk length is %d\n", length($str); grow_in_place($str); print "Restored: $str";

      Caution: Contents may have been coded under pressure.
        Roy Johnson,
        Are you talking about shrinking in place, so you don't have the overhead of having the compressed and uncompressed strings in memory at the same time?

        No - but neat. What I was talking about was with regards to $unpacked = ''. Instead, pre-size it and use the 4 arg substr to fill it up instead of growing it each time. As I said before, I didn't get my vec() approach to work but I would have also been able to do this in the shrink() sub as well.

        Cheers - L~R