in reply to gziping files on server

I've had very good results in terms of performance letting Perl handle the compression via the Compress::Zlib library (it does the crunching in C). Basically, the speed penalty of handling it in perl is offset by avoiding the cost of spawning children (although YMMV). If you use the cpan shell, you no doubt already have the module already installed.

#! /usr/bin/perl -w use strict; use Compress::Zlib; my $file = shift or die "no file on command line.\n"; my( $d, $status ) = deflateInit( {-Level => Z_BEST_COMPRESSION } ); die "deflator construction failed: $status\n" unless $status == Z_OK; my $deflated; open IN, $file or die "Cannot open $file for input: $!\n"; while( <IN> ) { ($deflated, $status) = $d->deflate( $_ ); die "deflator deflate failed: $status\n" unless $status == Z_O +K; print $deflated; } ($deflated, $status) = $d->flush(); die "deflator final flush failed: $status\n" unless $status == Z_OK; print $deflated; close IN;

Note that this script does not produce a zipfile directory, so you can't use gunzip/unzip on it directly; it is just the raw stream. You would decompress the file using an analogue inflate script (examples of how to do this are included in the pod).

This code is sub-optimal in that it reads the code line-by-line, instead of in blocks of 4096 bytes. This was a proof-of-concept demo I hacked up a while ago. I must say in passing though that the Compress::Zlib interface is truly awful.

If space is at a premium on the server, and you have the CPU cycles to spare, you should really be looking at bzip2 instead.


--
g r i n d e r

Replies are listed 'Best First'.
Re: Re: gziping files on server
by scain (Curate) on Jul 17, 2001 at 17:54 UTC
    grinder,

    Thanks for bringing bzip2 to my attention. I work with largeish files (400-1000M) which I keep compressed to save space. I don't exactly have cycles to spare, but I wanted to see how much better compression I got with bzip2 over gzip. Surprisingly, bzip gave better compression in about half the time. The main caveat (sp?) here is that the files contain DNA sequence data, so it is similar to, but not the same as, a regular text file.

    Here's what I got for my test case of 1 file:
    Original file size:316212340
    gzip compressed: 96294342
    % of original:30.5%
    gzip CPU seconds: 476
    bzip2 compressed: 88270646
    % of original:27.9%
    bzip2 CPU seconds:269
    Thanks again,
    Scott