meredith has asked for the wisdom of the Perl Monks concerning the following question:
I threw this module together using a perl implementation of LZW from Compress::SelfExtracting. As this will be my first published module, I thought I'd RFC it before it goes live. If you're wondering how this module came about, check here -> Any news on Compress::LZW?. Obviously, there are improvements in the works, like using blocking, making a best-guess decision on table size, and auto-detection of table size through magic headers, but this is just the first. :) Currently, using a 16-bit codeword table, the compression croaks at 65,536 codewords. Happy birthday, free LZW!
I believe I want to make this module move toward a format compatible with compress -- currently it is not...
############################################################
package Compress::LZW;
require Exporter;
use Carp;
use vars qw/@ISA @EXPORT $VERSION/;
use warnings;
use strict;
@EXPORT = qw/compress decompress/;;
@ISA = qw/Exporter/;
$VERSION = 0.01;
my (%LZ, %UNLZ);
%LZ = (12 => sub {
my $v = '';
for my $i (0..$#_) {
vec($v, 3*$i, 4) = $_[$i]/256;
vec($v, 3*$i+1, 4) = ($_[$i]/16)%16;
vec($v, 3*$i+2, 4) = $_[$i]%16;
}
$v;
},
16 => sub { pack 'S*', @_ });
%UNLZ = (12 => sub {
my $code = shift;
my @code;
my $len = length($code);
my $reallen = 2*$len/3;
foreach (0..$reallen - 1) {
push @code, (vec($code, 3*$_, 4)<<8)
| (vec($code, 3*$_+1, 4)<<4)
| (vec($code, 3*$_+2, 4));
}
@code;
},
16 => sub { unpack 'S*', shift; });
sub compress {
my ($str, $bits) = @_;
$bits = $bits ? $bits : 16;
my $p = '';
my %d = map{(chr $_, $_)} 0..255;
my @o = ();
my $ncw = 256;
for (split '', $str) {
if (exists $d{$p.$_}) {
$p .= $_;
} else {
push @o, $d{$p};
$d{$p.$_} = $ncw++;
$p = $_;
}
}
push @o, $d{$p};
if ($bits != 16 && $ncw < 1<<12) {
$bits = 12;
return $LZ{12}->(@o);
} elsif ($ncw < 1<<16) {
$bits = 16;
return $LZ{16}->(@o);
} else {
croak "Sorry, code-word overflow";
}
}
sub decompress {
my ($str, $bits) = @_;
$bits = $bits ? $bits : 16;
my %d = (map{($_, chr $_)} 0..255);
my $ncw = 256;
my $ret = '';
my ($p, @code) = $UNLZ{$bits}->($str);
$ret .= $d{$p};
for (@code) {
if (exists $d{$_}) {
$ret .= $d{$_};
$d{$ncw++} = $d{$p}.substr($d{$_}, 0, 1);
} else {
my $dp = $d{$p};
unless ($_ == $ncw++) { carp "($_ == $ncw)?! Check your ta
+ble size!" };
$ret .= ($d{$_} = $dp.substr($dp, 0, 1));
}
$p = $_;
}
$ret;
}
############################################################
1;
__END__
=head1 NAME
Compress::LZW -- Pure perl implementation of LZW
=head1 SYNOPSIS
use Compress::LZW;
my $compressed = compress($fatdata);
my $fatdata = decompress($compressed);
my $smallcompressed = compress($thindata, 12);
my $thindata = decompress($smallcompressed, 12);
=head1 DESCRIPTION
C<Compress::LZW> it a perl implementation of the newly free LZW
compression algorithm. It defaults to building a 16-bit codeword
table, but provides the ability to choose a 12-bit table also.
Depending on the size of your data, the 12-bit table may provide
better compression.
=head2 Functions
=over
=item C<compress>
Takes a string as its first argument, and returns the compressed
result. You can also specify the size of your codeword table in
C<@_[1]>, choosing either 12 or 16. 16 is the default. C<compress>
will
=item C<decompress>
Takes a string as its first argument, and returns the decompressed
result. You can also specify the size of your codeword table in
@_[1], choosing either 12 or 16. 16 is the default.
=back
=head1 EXPORTS
C<Compress::LZW> exports: C<compress> C<decompress>
That's all.
=head1 SEE ALSO
Other Compress::* modules, especially Compress::LZV1, Compress::LZF an
+d Compress::Zlib.
=head1 AUTHOR
Sean O'Rourke, E<lt>seano@cpan.orgE<gt> - Original author, C<Compress:
+:SelfExtracting>
Matt Howard E<lt>mhoward@hattmoward.orgE<gt> - C<Compress::LZW>
Bug reports welcome, patches even more welcome.
=head1 COPYRIGHT
Copyright (C) 2003 Sean O'Rourke & Matt Howard. All rights reserved,
+some wrongs
reversed. This module is distributed under the same terms as Perl
itself. Let me know if you actually find it useful.
=cut
Thanks,
mhoward - at - hattmoward.org
Re: RFC: Compress::LZW
by dash2 (Hermit) on Jun 20, 2003 at 00:53 UTC
|
My god, it's like Open Source 0-day warez! :-)
This will be great, perfect for packaging perl software to people in a nice, self-extracting file with no library requirements.
A massive flamewar beneath your chosen depth has not been shown here | [reply] [Watch: Dir/Any] |
|
??? I think your comment was meant for Compress::SelfExtracting, but that would have requirements anyway... namely a perl interpreter. I don't believe it applies the SelfExtracting to any of the libraries anyway. Also, it's okay to publish the code, it's just that you guys can't use it... until tomorrow ;)
mhoward - at - hattmoward.org
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: RFC: Compress::LZW
by diotalevi (Canon) on Jun 20, 2003 at 02:10 UTC
|
Please, please don't publish this until it handles .Z files correctly. That's the whole point to having this library (from my perspective).
| [reply] [Watch: Dir/Any] |
|
I understand your perspective, but there isn't much more to do to get it throwing out .Z's right and left. This is open-source, the idea is that someone else can look at my code and contribute. If someone doesn't heed the version number, the alpha status on CPAN, or the noted incompatibility with .Z files; aren't they asking for trouble? Development, man, development. Sure, someone CAN use the module for some nefarious perpose right now, but why do that when it will get better?
Anyway, I'm looking into the compress format. Lots to get sorted out though. It will take a little tweaking for the adaptive blocking to work as well as compress.
I have noted your desire for a compress-compatible format; trust me, I'm working. :)
mhoward - at - hattmoward.org
| [reply] [Watch: Dir/Any] |
Re: RFC: Compress::LZW
by Anonymous Monk on Jun 20, 2003 at 00:51 UTC
|
Agh! you've doomed us all! You just had to go and post it before the patent runs out... quick somebody change the date on it to tommorrow before 'they' see.
hehehe. Alright I've had my fun :). | [reply] [Watch: Dir/Any] |
Re: RFC: Compress::LZW
by meredith (Friar) on Jun 20, 2003 at 12:10 UTC
|
Before I upload this, I'm going to add a very obvious note that this isn't compatible with .Z files, nor is it currently stabilized in any way.
Also, I'm going to move this module to an OO interface (with some exported functions for the hardcore). This will provide a more plausible method for adaptive blocking, instead of a program having to feed _all_ of its data at once, we can "stream", just like compress does. OO will also provide for XS (well, more easily:) in the future.
Thanks for all the input so far everyone!
mhoward - at - hattmoward.org
| [reply] [Watch: Dir/Any] |
|
Might be also worth noting that it's only the US patent that has expired - IIRC, the international patent (e.g., the one affecting us elsewhere in the world ;) is in effect for another year yet. While it's unlikely to have any real legal implications, it should be noted.
| [reply] [Watch: Dir/Any] |
|
|