erm has asked for the wisdom of the Perl Monks concerning the following question:

Please could someone help solve a problem I have using the Compress::Zlib module? (apologies for length of post)
Background:
OS: AIX
I am trying to unzip a .gz file that is too large for my version of gzip (117086673 bytes). Gzip just returns that the file is 'too large', so I thought Perl could help out.
I tried the following code (cribbed from CPAN):
my $buffer; my $gz = gzopen ("$input_dir/$file", 'rb') || die "Can't open gz $file\n"; while ($gz -> gzread ($buffer) > 0) { syswrite DATAFILE, $buffer, 4096; } $gz -> gzclose; close DATAFILE;
The result was an uncompressed, but uncomplete output file. I thought maybe the buffer wasn't flushing at the end so I also tried (courtesy of CPAN):
open (GZIPFILE, "$input_dir/$file") || warn "Can't open zip input file: $file: $!"; binmode GZIPFILE; open (DATAFILE, ">>$input_dir/$dat_file") || warn "Can't open uncompressed data file: $dat_file: $!"; my $deflator = deflateInit() or die "Cannot create a deflation stream\ +n" ; my ($output, $status); while (<GZIPFILE>) { ($output, $status) = $deflator->deflate($_) ; $status == Z_OK or die "deflation failed\n" ; print (DATAFILE $output) ; } ($output, $status) = $deflator->flush() ; $status == Z_OK or die "deflation failed\n" ; print (DATAFILE $output) ;
The result was again a partially uncompressed file. Is there a maximum file size that Zlib can handle? If there is, is it configurable? If not, does anybody know what I am doing wrong?
Any other suggestions?
Thanks,
Erm.

Replies are listed 'Best First'.
Re: Zlib is stopping me leaving work early and going to the pub...
by hannibal (Scribe) on Apr 06, 2001 at 19:33 UTC

    My first question is, if it is too big for your gzip, how did you get your .gz in the first place? I suppose you got it off another system or something..

    From a quick trip to the gzip homepage, FAQ #10 seems like it might be relevant, because it is about gzip handling files larger than 4 GB (it requires a patch to gzip and some added compliation flags for AIX, which caught my attention in your case). However, your file is nowhere near 4 GB. Skimming quickly though zlib's homepage, I could not find any thing about maximum file sizes for zlib. Gzip and zlib are both based on the same algorithm (deflate) but the file size thing might just be a soft limit set somewhere. My suggestion is to dig through the zlib page, and if you can't find anything there, email the authors of zlib.

    Anyway I hope that helps a little :)

Re: Zlib is stopping me leaving work early and going to the pub...
by Malkavian (Friar) on Apr 06, 2001 at 19:58 UTC
    Another couple of gotchas that I've found along the way:

    • Maximum filesize of the operating system: On linux 2.2.x, this is a 2GB file limit. Anything above that will give you errors and a truncated file. Other OSes have other filesize limits. Check on this.
    • Errors because of archive creation: A common way to create a large archive is simply concatenate gzips together (*NIX style: cat archive1.gz >> archive2.gz). If erroneous data gets thrown in here in one of the concatenation steps, it could break your unzip stream.
    • Errors becuase of double zip in concatenated archive: If the above concatenation is used to create the archive, it's possible that at some point, a zip is then itself zipped. This actually increases the zipped zip's size by a few bytes for the headers, but unzipping it would then produce a zip file. This could explain your 'partially unzipped file'.
    The ports of call are thus to check out your maximum os filesize (shouldn't really be a problem for AIX), your version of Zlib, and the way the archive was created.
    Also make sure the partition you're unzipping on can take the data. The zip size itself shouldn't be a problem (a fair wack of my work deals with pretty big log files. On average, the gzips I work with are 170-230MB in size. And they work fine.

    Hopefully this helps some,

    Malk
Re: Zlib is stopping me leaving work early and going to the pub...
by erm (Initiate) on Apr 06, 2001 at 20:43 UTC
    Thanks Malk & Hannibal,
    It turns out not to have been a problem with Zlib at all. A system variable (ulimit) was not large enough!
    Thanks for your help.