Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

RFC: Compress::LZW

by meredith (Friar)
on Jun 20, 2003 at 00:28 UTC ( [id://267408]=perlquestion: print w/replies, xml ) Need Help??

meredith has asked for the wisdom of the Perl Monks concerning the following question:

I threw this module together using a perl implementation of LZW from Compress::SelfExtracting. As this will be my first published module, I thought I'd RFC it before it goes live. If you're wondering how this module came about, check here -> Any news on Compress::LZW?. Obviously, there are improvements in the works, like using blocking, making a best-guess decision on table size, and auto-detection of table size through magic headers, but this is just the first. :) Currently, using a 16-bit codeword table, the compression croaks at 65,536 codewords. Happy birthday, free LZW!

I believe I want to make this module move toward a format compatible with compress -- currently it is not...
############################################################ package Compress::LZW; require Exporter; use Carp; use vars qw/@ISA @EXPORT $VERSION/; use warnings; use strict; @EXPORT = qw/compress decompress/;; @ISA = qw/Exporter/; $VERSION = 0.01; my (%LZ, %UNLZ); %LZ = (12 => sub { my $v = ''; for my $i (0..$#_) { vec($v, 3*$i, 4) = $_[$i]/256; vec($v, 3*$i+1, 4) = ($_[$i]/16)%16; vec($v, 3*$i+2, 4) = $_[$i]%16; } $v; }, 16 => sub { pack 'S*', @_ }); %UNLZ = (12 => sub { my $code = shift; my @code; my $len = length($code); my $reallen = 2*$len/3; foreach (0..$reallen - 1) { push @code, (vec($code, 3*$_, 4)<<8) | (vec($code, 3*$_+1, 4)<<4) | (vec($code, 3*$_+2, 4)); } @code; }, 16 => sub { unpack 'S*', shift; }); sub compress { my ($str, $bits) = @_; $bits = $bits ? $bits : 16; my $p = ''; my %d = map{(chr $_, $_)} 0..255; my @o = (); my $ncw = 256; for (split '', $str) { if (exists $d{$p.$_}) { $p .= $_; } else { push @o, $d{$p}; $d{$p.$_} = $ncw++; $p = $_; } } push @o, $d{$p}; if ($bits != 16 && $ncw < 1<<12) { $bits = 12; return $LZ{12}->(@o); } elsif ($ncw < 1<<16) { $bits = 16; return $LZ{16}->(@o); } else { croak "Sorry, code-word overflow"; } } sub decompress { my ($str, $bits) = @_; $bits = $bits ? $bits : 16; my %d = (map{($_, chr $_)} 0..255); my $ncw = 256; my $ret = ''; my ($p, @code) = $UNLZ{$bits}->($str); $ret .= $d{$p}; for (@code) { if (exists $d{$_}) { $ret .= $d{$_}; $d{$ncw++} = $d{$p}.substr($d{$_}, 0, 1); } else { my $dp = $d{$p}; unless ($_ == $ncw++) { carp "($_ == $ncw)?! Check your ta +ble size!" }; $ret .= ($d{$_} = $dp.substr($dp, 0, 1)); } $p = $_; } $ret; } ############################################################ 1; __END__ =head1 NAME Compress::LZW -- Pure perl implementation of LZW =head1 SYNOPSIS use Compress::LZW; my $compressed = compress($fatdata); my $fatdata = decompress($compressed); my $smallcompressed = compress($thindata, 12); my $thindata = decompress($smallcompressed, 12); =head1 DESCRIPTION C<Compress::LZW> it a perl implementation of the newly free LZW compression algorithm. It defaults to building a 16-bit codeword table, but provides the ability to choose a 12-bit table also. Depending on the size of your data, the 12-bit table may provide better compression. =head2 Functions =over =item C<compress> Takes a string as its first argument, and returns the compressed result. You can also specify the size of your codeword table in C<@_[1]>, choosing either 12 or 16. 16 is the default. C<compress> will =item C<decompress> Takes a string as its first argument, and returns the decompressed result. You can also specify the size of your codeword table in @_[1], choosing either 12 or 16. 16 is the default. =back =head1 EXPORTS C<Compress::LZW> exports: C<compress> C<decompress> That's all. =head1 SEE ALSO Other Compress::* modules, especially Compress::LZV1, Compress::LZF an +d Compress::Zlib. =head1 AUTHOR Sean O'Rourke, E<lt>seano@cpan.orgE<gt> - Original author, C<Compress: +:SelfExtracting> Matt Howard E<lt>mhoward@hattmoward.orgE<gt> - C<Compress::LZW> Bug reports welcome, patches even more welcome. =head1 COPYRIGHT Copyright (C) 2003 Sean O'Rourke & Matt Howard. All rights reserved, +some wrongs reversed. This module is distributed under the same terms as Perl itself. Let me know if you actually find it useful. =cut
Thanks,

mhoward - at - hattmoward.org

Replies are listed 'Best First'.
Re: RFC: Compress::LZW
by dash2 (Hermit) on Jun 20, 2003 at 00:53 UTC
    My god, it's like Open Source 0-day warez! :-)

    This will be great, perfect for packaging perl software to people in a nice, self-extracting file with no library requirements.

    A massive flamewar beneath your chosen depth has not been shown here

      ??? I think your comment was meant for Compress::SelfExtracting, but that would have requirements anyway... namely a perl interpreter. I don't believe it applies the SelfExtracting to any of the libraries anyway. Also, it's okay to publish the code, it's just that you guys can't use it... until tomorrow ;)

      mhoward - at - hattmoward.org
        Well actually, I was thinking of doing the self-extracting bit myself - just a perl script with some __DATA__ .... And yes, it's for perl scripts on the server, I didn't think that you could create a binary with it or something. (And I was joking about the warez.)
        A massive flamewar beneath your chosen depth has not been shown here
Re: RFC: Compress::LZW
by diotalevi (Canon) on Jun 20, 2003 at 02:10 UTC

    Please, please don't publish this until it handles .Z files correctly. That's the whole point to having this library (from my perspective).

      I understand your perspective, but there isn't much more to do to get it throwing out .Z's right and left. This is open-source, the idea is that someone else can look at my code and contribute. If someone doesn't heed the version number, the alpha status on CPAN, or the noted incompatibility with .Z files; aren't they asking for trouble? Development, man, development. Sure, someone CAN use the module for some nefarious perpose right now, but why do that when it will get better?

      Anyway, I'm looking into the compress format. Lots to get sorted out though. It will take a little tweaking for the adaptive blocking to work as well as compress.

      I have noted your desire for a compress-compatible format; trust me, I'm working. :)

      mhoward - at - hattmoward.org
Re: RFC: Compress::LZW
by Anonymous Monk on Jun 20, 2003 at 00:51 UTC
    Agh! you've doomed us all! You just had to go and post it before the patent runs out... quick somebody change the date on it to tommorrow before 'they' see.

    hehehe. Alright I've had my fun :).
Re: RFC: Compress::LZW
by meredith (Friar) on Jun 20, 2003 at 12:10 UTC
    Before I upload this, I'm going to add a very obvious note that this isn't compatible with .Z files, nor is it currently stabilized in any way.

    Also, I'm going to move this module to an OO interface (with some exported functions for the hardcore). This will provide a more plausible method for adaptive blocking, instead of a program having to feed _all_ of its data at once, we can "stream", just like compress does. OO will also provide for XS (well, more easily:) in the future. Thanks for all the input so far everyone!

    mhoward - at - hattmoward.org
      Might be also worth noting that it's only the US patent that has expired - IIRC, the international patent (e.g., the one affecting us elsewhere in the world ;) is in effect for another year yet. While it's unlikely to have any real legal implications, it should be noted.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://267408]
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-29 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found