Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm needing to process a large number of huge gzip'ped log files. I have working code which does the obvious decompression/compression:

# ... system('gunzip', "$filename.gz"); open(IN, $filename) or die "unable to open '$filename'"; while (<IN>) { # process each line } close IN; system('gzip', $filename); # ...
I have a lot of files to process. How can I use zcat to make this run faster? Reading the entire log file into a scalar with slurp mode is prohibitive given the file size.

Any idea would be appreciated. Thanks.

Replies are listed 'Best First'.
Re: using zcat for input?
by Zaxo (Archbishop) on Nov 01, 2004 at 00:28 UTC

    In 5.8+ with PerlIO, you can use PerlIO::gzip.

    use PerlIO::gzip; for (glob '*.gz') { open my $fh, '<:gzip', $_ or die $!; { local $_; while (<$fh>) { # do stuff } } close $fh or die $!; }
    If you're in an earlier perl, Compress::Zlib has what you need.

    After Compline,
    Zaxo

      I just tested the speed of these three posibilites:

      Compress::Zlib took 3:55 min
      PerlIO::gzip took 2:51 min
      and Zcat took 2:30 min

      as I am parsing serveral Gigabyte gzip logs per day Ill stick with zcat.

      Cheers
      Schuk

Re: using zcat for input?
by atcroft (Abbot) on Nov 01, 2004 at 00:22 UTC

    You would want to open it as piped input, such as the following:

    open(IN, sprintf("zcat %s |", $filename)) or die("Can't open pipe from command 'zcat $filename' : $!\n"); while (<IN>) { # process lines } close(IN);

    See also the documentation for the open() function (also available via 'perldoc -f open').

    Hope that helped.

    Update: (31 Oct 2004)

    Added link to documentation for open(). Changed from using 'gzip -dc $filename |' to 'zcat $filename |'.

      Just curious...but why the sprintf on the open. Wouldn't open(IN, "zcat $filename |") or die "..." be sufficient?

      thor

      Feel the white light, the light within
      Be your own disciple, fan the sparks of will
      For all of us waiting, your kingdom will come

        at least not for me, since the scripts take a lot of time to finish.