hoffy has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!

I have been battling an issue where I need to create archives of files that can be anything up to 6GB in size.

I have been using the Archive::Zip (as found in v5.8.9)module and have bottomed out at the "Around 4GB" limit that Zip appears to have.

Without going to another 3rd party application, does anyone have any suggestions on ways around this limit? I really don't want to split the files (these will need to be extracted by my operators......need to make things simple!), but am open to using any other method.

Thanks in advance

Replies are listed 'Best First'.
Re: Creating Archives for Files > 4GB
by Corion (Patriarch) on Jul 23, 2010 at 06:40 UTC

    I think this limit is inherent to the .zip file format. Zip (File Format) says that the format only allocates 4 bytes for the uncompressed file size.

    You will have to find another format to store your compressed files.

Re: Creating Archives for Files > 4GB
by desemondo (Hermit) on Jul 23, 2010 at 12:55 UTC
    Take a look at IO::Compress::Zip

    It supports the (fairly) new Zip64 standard which handles files much larger than the current zip library standard.

    Bear in mind though, that unless your using Windows7 or a recent *nix you'll need to use a 3rd party app to extract the data later, such as Info Zip 6.x or later, (as i don't think IO::Uncompress::Unzip supports reading Zip64 files just yet hopefully its coming soon).

    IO::Uncompress::Unzip does support Zip64, thanks pmqs !

      Yes, IO::Compress::Zip supports Zip64 (which allows zip members to be larger than 4Gig). Here is an example of how to use it
      use IO::Compress::Zip qw(:all); zip [ "bigfile1", "bigfile2"] => "myzip.zip", Zip64 => 1 or die "Cannot create zip file: $ZipError\n";
      IO::Uncompress::Unzip supports Zip64 as well.
Re: Creating Archives for Files > 4GB
by Khen1950fx (Canon) on Jul 23, 2010 at 12:22 UTC
    Archive::Any::Create could make things a lot easier for you. It'll create an archive either in tar.gz or zip format. I'd go with the tar.gz. Here's an example:
    #!/usr/bin/perl use strict; use warnings; use Archive::Any::Create; my $archive = Archive::Any::Create->new; $archive->container('myStuff');#top-level dir with your files $archive->add_file('stuff.pl', 'perl script'); $archive->add_file('morestuff.pl', 'perl script'); $archive->write_file('myStuff.tar.gz'); #or #$archive->write_file('myStuff.zip');
    It can knock off a 6 GB tarball like it was 6 K.
      True, however gzip doesn't support/provide random access to individual files within the gzipped archive. The entire archive must be unzipped first, even if you only want to access 1 file...

      For some tasks/processes that is a BIG difference... for others, it probably doesn't matter much...
Re: Creating Archives for Files > 4GB
by aquarium (Curate) on Jul 23, 2010 at 07:20 UTC
    And apart from the zip format limitation, creating such huge zips is fraught with lenghty zipping/unzipping and running out of space during such operations, which require temporary additional storage space.
    Maybe this part of the process needs to be re-evaluated and re-designed if possible?
    the hardest line to type correctly is: stty erase ^H
Re: Creating Archives for Files > 4GB
by furry_marmot (Pilgrim) on Jul 24, 2010 at 04:47 UTC

    What kind of data is it? In particular, are you compressing 6 GB files into zip archives? Or do you have a collection of much smaller files that are collectively 6 GB?

    Assuming you have the disk space, what if the information your operators are extracting was just sitting in directories? That may not be a solution, but if your system could function that way, then maybe you can think about other ways to approach what you're doing. A database would be another way, except then you'd have to maintain a database...

    --marmot

      Thanks for the reply

      I am talking about one file of 6GB of data. These files are just straight ASCII characters (Financial institution Statement files), so when using something like 7 zip, they compress down reasonably well. I would use 7 zip, but an over zealous auditor and manager has decreed that Open Source is bad.... (You wouldn't believe how much hassle I had to go through to get PERL approved! I am not even allowed to download modules from the CPAN :-( )

        Ye Gods! My condolences. :-(

        Well...hmmmm. My thoughts all go to breaking your file up into chunks. I mean, on the face of it, you've got a system where your data source drops 6GB files and actually expects someone to use them. But if you can't get Perl installed easily, you're probably not in a position to re-engineer your company's processes ("Excuse me sir, I think I'm smarter than you and...what's that?...yes, I like working here...oh...sorry...").

        So, for a chunk-wise example, you can use sysread() to read a monster text file (perhaps this one) in smallish chunks (512k, 4MB, whatever). You write a function like get_next_chunk() that manages the chunk-reading as needed, finding the start and end of the current "record", as defined by you. Then you write your main function with a while(get_next_record()){} loop, and it never has to know about sysread() or chunks at all.

        So now you abstract this a bit further. In some pre-process, you break your data into chunks (size dependent on memory and performance) and zip them separately. Then your get_next_record() function uses Archive::Zip or IO::Compress to read and decompress each chunk.

        It might require a bit of glue in the middle of your current process, but this is where I'd start. I realize I'm talking through my hat here because I don't know anything about the structure of your files or what you're doing with them.

        Cheers!
        --marmot

        Update: Got back from an inspection and clarified my comments.