reading compressed data

kettle has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: reading compressed data
by graff (Chancellor) on Dec 13, 2006 at 02:24 UTC

PerlIO::gzip

use PerlIO::gzip;

open( INPUT, "<:gzip", "old.gz" ) or die "old.gz: $!";
open( OUTPUT, ">:gzip", "new.gz" ) or die "new.gz: $!";

while (<INPUT>) {
    # do something with a line of text...
    s/[\r\n]+/\n/;  # for example, normalize line terminations
    print OUTPUT;
}
[download]

open( INPUT, "gunzip < old.gz |" ) or die $!;
open( OUTPUT, "| gzip > new.gz" ) or die $!;
while (<INPUT>) {
    # same as above...
}
[download]

gzip

UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).

[reply]
[d/l]
[select]

Re^2: reading compressed data

by kettle (Beadle) on Dec 13, 2006 at 02:34 UTC

didn't even realize that i can use standard pipeline options in my open() statements! thanks!

[reply]

Re^3: reading compressed data

by graff (Chancellor) on Dec 13, 2006 at 03:47 UTC

If it's just a one-shot pass over 5.3 GB, don't sweat it and use whichever one strikes you as more fun. But if this will be an ongoing, oft-repeated process working on lots of data, it might be worth your while to set up a simple test to see if there might be a speed difference.

In that case, I'd advise against test scripts that only do the i/o. Contrast two versions of the script such that both do everything that needs to be done, and they differ only in the i/o method. If one is faster than the other, you'll get a clear idea of how important the difference is in the context of evertyhing else the script does.

UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).

[reply]

Re^2: reading compressed data

by kettle (Beadle) on Dec 13, 2006 at 02:28 UTC

Awesome, thanks a lot!

[reply]

Re: reading compressed data
by jasonk (Parson) on Dec 13, 2006 at 02:29 UTC

You must not have looked very hard, there are lots of modules that can do this, including Compress::Zlib, IO::Zlib, Compress::Raw::Zlib, IO::Compress::Zlib, Compress::Zlib::Perl, IO::Compress::Gzip, Tie::Gzip and PerlIO::Gzip.

We're not surrounded, we're in a target-rich environment!

[reply]

Re^2: reading compressed data

by kettle (Beadle) on Dec 13, 2006 at 02:43 UTC

IO::Uncompress::RawInflate

"WARNING -- This is a Beta release. Do NOT use in production code."

[reply]

Re: reading compressed data
by Util (Priest) on Dec 13, 2006 at 02:42 UTC

open()

# Old-style: Bareword filehandles and two-arg opens:
open IN,  "zcat $in_filename|"         or die;
open OUT, "|gzip -c - > $out_filename" or die;

# New style: Lexical filehandles and three-arg opens
open my $in_fh,  '-|', "zcat $in_filename"         or die;
open my $out_fh, '|-', "gzip -c - > $out_filename" or die;
[download]

perlipc

[reply]
[d/l]