lostcause has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
Its a long time since I have asked a question on SOPW such is the power of super search and the many arcane snippets of information one can find in the darkest corners of the monastery. Alas this problem has me stumped.

I am have automated the downloading of genome databases from ftp.ncbi.nih.gov/blast/db, uncompressing them and indexing with a search engine, prospector.ucsf.edu.

My script works on both Windows and Sun OS's. I currently uncompress the databases with a system call either using the uncompress command on unix or a port of the tool on windows (UNCOMP.exe). One of the databases is a foo.gz compressed file and I use the CPAN Compress::ZLib library for that database (/genbank/genpept.fas.gz Genpept). I would like to use the Compress::ZLib library for all the files. I've played with the example scripts and include my modified versions below but they don't seem to be working. Has anyone had success with foo.Z files before.

As some of the databases are very large I don't want to do this in memory.

Which leads me to this additional question, is there any way of finding out the original file size before uncompressing the file? I can make a guess by dividing the compressed file by 0.55 for amino acid files and 0.3 for DNA files. It would be great if I knew the actual file size as then I could use that information to create a progress bar.

As always thanks for any tips in solving this problem.

The code:
#!/usr/bin/perl -w use strict; use Compress::Zlib; my ($tempFilename2, $tempFilename1, $gz, $buffer, $gzerrno); $tempFilename1 = "yeast.aa.Z"; $tempFilename2 = "yeast.aa"; open (GZIPFILE, "$tempFilename1") || warn "Can't open zip input file: $tempFilename1: $!"; binmode GZIPFILE; open (DATAFILE, ">>$tempFilename2") || warn "Can't open uncompressed data file: $tempFilename2: $!"; binmode DATAFILE; my $x = inflateInit() or die "Cannot create a inflation stream\n" ; my $input = '' ; my ($output, $status) ; while (read(GZIPFILE, $input, 4096)) { ($output, $status) = $x->inflate(\$input) ; print DATAFILE $output if $status == Z_OK or $status == Z_STREAM_END ; last if $status != Z_OK ; } die "inflation failed\n" unless $status == Z_STREAM_END ;
Or using the IO-Zlib library:
#!/usr/bin/perl -w use strict; use IO::Zlib; my $tempFilename1 = "yeast.aa.Z"; my $tempFilename2 = "yeast.aa"; open (DATAFILE, ">>$tempFilename2") || warn "Can't open uncompressed data file: $tempFilename2: $!"; my $fh = new IO::Zlib; if ($fh->open("$tempFilename1", "rb")) { print DATAFILE <$fh>; $fh->close; }

Replies are listed 'Best First'.
Re: uncompressing a foo.Z file
by bart (Canon) on Aug 31, 2002 at 12:19 UTC
    I would like to use the Compress::ZLib library for all the files.
    Sorry, no can do. As you may know, Compress::Zlib is not much more than an interface to the Zlib library. I found this entry in the ZLib FAQ:
    13. Can zlib handle .Z files?
    No, sorry. You have to spawn an uncompress or gunzip subprocess, or adapt the code of uncompress on your own.
    The reason for this is indeed most likely as sauoq wrote: copyright and licencing problems. The main reason Zlib was written, is precisely to circumvent this kind of problems, by offering a free alternative to do compression. Unfortunately this implies that it can't be compatible with that what it tries to substitute.

    From this Zlib FAQ entry, I would deduce that it's not illegal to write and use a program that is compatible with compress, but you cannot distribute it without a licence, not even for free, and a licence would cost you a lot of money.

    So, don't set your hopes too high on somebody writing such a Perl module.

      From this Zlib FAQ entry, I would deduce that it's not illegal to write and use a program that is compatible with compress, but you cannot distribute it without a licence, not even for free, and a licence would cost you a lot of money.

      I don't think there would be a real problem with, say, a Compress::Uncompress module which only uncompressed .Z files. I say so because gzip does so. It just doesn't create them.

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: uncompressing a foo.Z file
by sauoq (Abbot) on Aug 30, 2002 at 23:51 UTC

    Unix compress and uncompress utilities use LZW compression. I don't believe that Zlib handles LZW compression because it is a patented algorithm. It is the same compression algorithm used in GIF images and the source of all the controversy surrounding the use of GIFs.

    The gzip tool does handle uncompression of LZW compressed files. I don't think the uncompression algorithm is patented. (Don't take my word on that though.) Zlib likely doesn't bother because its purpose is to provide another compression algorithm not to support uncompression of data compressed with a patented algorithm.

    There are a few other compression libraries on CPAN but I don't think any of them support the functionality you need. Your best bet, until someone writes an uncompress module just for this, is to call another program to do it for you like you are already doing.

    -sauoq
    "My two cents aren't worth a dime.";