js1 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm seeking some wisdom so I have come to you.

I have a big gzip file which I unzip from a shell script and pass to my perl program:

gzip -dc log.gz | prog.pl

Using a while(<>) loop, I read in each line and run some tests on that line.

What I need to know is, how much of that gzip file I have processed, so that I can pass a value to my progress bar to show how far through it is.

Thanks for any advice.

js1.

Replies are listed 'Best First'.
Re: tracking progress of gzip'd file
by arden (Curate) on Feb 09, 2004 at 20:14 UTC
    js1, why not do the unzipping within your perl script? You could call the script with the argument of the filename. That way, within the script you can first determine the file size, then keep track of how much data you've processed so far, calculating your progress. . . There's no need to have gzip pipe the data to perl.

      The above's probably mentioning using Compress::Zlib to do the expanding rather than shelling out to gzip.

      As a somewhat related aside: the problem is that you're only going to know how much of the input file you've processed; you can't reliably know how much output you're going to get before you actually uncompress it (with gzip or bzip2; zip stores the compressed and uncompressed sizes in its header). As an extreme example, this 1483 byte uuencoded file will blow up to 1M of *'s when gunzip'd.

      Update: Never mind me, I'd forgotten about gzip -l as mentioned below. Feh, must be Monday . . .

Re: tracking progress of gzip'd file
by jsegal (Friar) on Feb 09, 2004 at 20:18 UTC
    You may need to restructure how you run your script a bit to do this, but you could start by parsing the output of gunzip -l to get the uncompressed size. Then you can keep track of how much data you have read, and use that to update your progress bar. E.g. something like:
    my $amount_so_far = 0; while (my $line = <>) { $amount_so_far += length($_); update_progess_bar_as_needed($amount_so_far,$total_length); }

    Of course, you need to get the $total_length into your script, which you could do either by passing in the filename to your prog.pl and having it call gzip (both with -l to get the length and with -dc to do the unzipping), or by externally calling gunzip -l to get the length and passing that in as a parameter to prog.pl. I would recommend the former, BTW...

    Good luck...



    --JAS
Re: tracking progress of gzip'd file
by Vautrin (Hermit) on Feb 09, 2004 at 23:07 UTC
    If you really want to use a shell script consider chaining in the % wc -l command. That way you could have the first line be the number of lines you have to process. Even better, follow one of the suggestions others have made about opening the file from within perl.

    Want to support the EFF and FSF buy buying cool stuff? Click here.

      Thanks for the replies,

      I have taken your advice and coded the gzip from my perl script using Compress::Zlib. However I think I may have ran into a problem already. How do I determine the uncompressed size of the gz file using this module?

      js1.

        I've managed to get it running gzip -l:

        $gziplist=`gzip -l $file`; $gziplist=~/.\n\s*\d*\s*(\d*)/; $filesize=$1;