rlb3 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have been given a compressed file that needs some parsing. The problem is that when I try to uncompress it, the file is bigger than 2G. I know this 2G limit is a part of the ext3 file system, so I created a 4G JFS, but when the file hit the 2G mark it errored out again. I get the same response with reiserFS. I was wondering is it possible to use a CPAN module to open, mangle, and compress to a new file with out having to uncompress the whole file? Any help would be appreciated.

rlb3

Replies are listed 'Best First'.
Re: File > 2G under linux
by Abigail-II (Bishop) on Sep 11, 2003 at 10:37 UTC
    To be able to deal with files over 2G, your perl must be compiled with USE_LARGE_FILES. Under Linux, Configure enables this by default starting from version 5.6.0. See also the INSTALL file, section "Large file support".

    You may want to check the output of perl -V.

    Abigail

      I think that's the point, because ext3 really has support for files over 2GB, the other mentioned fs too!
Re: File > 2G under linux
by broquaint (Abbot) on Sep 11, 2003 at 10:57 UTC
    If you've got access to perl5.8.0+ then you could take advantage of the funky new IO layering system and employ the use of PerlIO::gzip. So you could do something like this
    use PerlIO::gzip; open my $in_fh => "<:gzip", 'input.gz' or die "ack: $!"; open my $out_fh => ">:gzip", 'output.gz' or die "ack: $!"; while(<$in_fh>) { do_stuff($_) if /matches some condition/; print {$out_fh} $_; }
    See. the PerlIO docs for more info on IO layers in perl5.8.0+ and PerlIO::gzip for info on the gzip layer used above.
    HTH

    _________
    broquaint

      Now that's cool ... makes me wish I had 5.8. ++
Re: File > 2G under linux
by edan (Curate) on Sep 11, 2003 at 10:23 UTC

    not sure about the filesystem issues, but you could try just uncompressing the file to STDOUT, read that line-by-line and do your parsing, and write to a pipe that compresses. I have done this using gzip with success (not using large files, just in general). something like this (UNTESTED):

    open(INPUT, "/usr/bin/gzip -d -c '$filename' |") open(OUTPUT, "| /usr/bin/gzip > '$filename'"); while(<INPUT>) { # munge print OUTPUT; } close INPUT; close OUTPUT;

    You could also look at Compress::ZLib, which might work for you...

    --
    3dan

      It was written:

      open(INPUT, "/usr/bin/gzip -d -c '$filename' |") open(OUTPUT, "| /usr/bin/gzip > '$filename'");

      Make sure that you use a different value for $filename on each of these calls, or you (may|will) clobber the contents of the file you are trying to read, that is, unless you are using an OS with versioned files.

      --MidLifeXis

        Quite right. Goog thing I included the 'UNTESTED' disclaimer! :-) I cut and paste some code from different places, one of which did the reading and the other writing - I was solving a different problem than the one posed here. Good eye!

        --
        3dan
Re: File > 2G under linux
by hardburn (Abbot) on Sep 11, 2003 at 13:56 UTC

    On the IA-32 (read: Intel) architecture, all Linux filesystems starting in the 2.4 series support 64-bit file sizes. Note that in POSIX, open() only uses 32-bit file sizes, so C code has to use a different function (open64(), IIRC). My guess is that your decompression program is using the older version of open(), which you should be able to fix by getting a new version to compile (assuming its one of the common Free Software compressors, like gzip or bzip2).

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated