in reply to reading compressed data

If you have (or can install) PerlIO::gzip, you can read and write compressed files using an IO layer, like this:
use PerlIO::gzip; open( INPUT, "<:gzip", "old.gz" ) or die "old.gz: $!"; open( OUTPUT, ">:gzip", "new.gz" ) or die "new.gz: $!"; while (<INPUT>) { # do something with a line of text... s/[\r\n]+/\n/; # for example, normalize line terminations print OUTPUT; }
If for some reason you have constraints that get in the way of installing non-core modules, but you have "gzip" and "gunzip" on your system (and in your PATH), you can just use pipeline opens:
open( INPUT, "gunzip < old.gz |" ) or die $!; open( OUTPUT, "| gzip > new.gz" ) or die $!; while (<INPUT>) { # same as above... }
There are other methods as well, involving other modules (try looking at the search results for gzip at CPAN).

UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).

Replies are listed 'Best First'.
Re^2: reading compressed data
by kettle (Beadle) on Dec 13, 2006 at 02:34 UTC
      I haven't done / don't recall seeing any benchmarks comparing PerlIO::gzip against the pipeline open, and I wouldn't hazard a guess that one of them is likely to be significantly faster than the other.

      If it's just a one-shot pass over 5.3 GB, don't sweat it and use whichever one strikes you as more fun. But if this will be an ongoing, oft-repeated process working on lots of data, it might be worth your while to set up a simple test to see if there might be a speed difference.

      In that case, I'd advise against test scripts that only do the i/o. Contrast two versions of the script such that both do everything that needs to be done, and they differ only in the i/o method. If one is faster than the other, you'll get a clear idea of how important the difference is in the context of evertyhing else the script does.

      UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).

Re^2: reading compressed data
by kettle (Beadle) on Dec 13, 2006 at 02:28 UTC
    Awesome, thanks a lot!