Creating Archives for Files

hoffy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Creating Archives for Files > 4GB by Corion (Patriarch) on Jul 23, 2010 at 06:40 UTC
I think this limit is inherent to the `.zip` file format. Zip (File Format) says that the format only allocates 4 bytes for the uncompressed file size. You will have to find another format to store your compressed files.	[reply] [d/l]
Re: Creating Archives for Files > 4GB by desemondo (Hermit) on Jul 23, 2010 at 12:55 UTC
Take a look at IO::Compress::Zip It supports the (fairly) new Zip64 standard which handles files much larger than the current zip library standard. Bear in mind though, that unless your using Windows7 or a recent *nix you'll need to use a 3rd party app to extract the data later, such as Info Zip 6.x or later, ~~(as i don't think IO::Uncompress::Unzip supports reading Zip64 files just yet hopefully its coming soon)~~. IO::Uncompress::Unzip does support Zip64, thanks pmqs !	[reply]
Re^2: Creating Archives for Files > 4GB by pmqs (Friar) on Jul 24, 2010 at 09:06 UTC
Yes, IO::Compress::Zip supports Zip64 (which allows zip members to be larger than 4Gig). Here is an example of how to use it `use IO::Compress::Zip qw(:all); zip [ "bigfile1", "bigfile2"] => "myzip.zip", Zip64 => 1 or die "Cannot create zip file: $ZipError\n";` [download] IO::Uncompress::Unzip supports Zip64 as well.	[reply] [d/l]
Re: Creating Archives for Files > 4GB by Khen1950fx (Canon) on Jul 23, 2010 at 12:22 UTC
Archive::Any::Create could make things a lot easier for you. It'll create an archive either in tar.gz or zip format. I'd go with the tar.gz. Here's an example: `#!/usr/bin/perl use strict; use warnings; use Archive::Any::Create; my $archive = Archive::Any::Create->new; $archive->container('myStuff');#top-level dir with your files $archive->add_file('stuff.pl', 'perl script'); $archive->add_file('morestuff.pl', 'perl script'); $archive->write_file('myStuff.tar.gz'); #or #$archive->write_file('myStuff.zip');` [download] It can knock off a 6 GB tarball like it was 6 K.	[reply] [d/l]
Re^2: Creating Archives for Files > 4GB by desemondo (Hermit) on Jul 23, 2010 at 12:37 UTC
True, however gzip doesn't support/provide random access to individual files within the gzipped archive. The entire archive must be unzipped first, even if you only want to access 1 file... For some tasks/processes that is a BIG difference... for others, it probably doesn't matter much...	[reply]
Re: Creating Archives for Files > 4GB by aquarium (Curate) on Jul 23, 2010 at 07:20 UTC
And apart from the zip format limitation, creating such huge zips is fraught with lenghty zipping/unzipping and running out of space during such operations, which require temporary additional storage space. Maybe this part of the process needs to be re-evaluated and re-designed if possible? the hardest line to type correctly is: stty erase ^H	[reply]
Re: Creating Archives for Files > 4GB by furry_marmot (Pilgrim) on Jul 24, 2010 at 04:47 UTC
What kind of data is it? In particular, are you compressing 6 GB files into zip archives? Or do you have a collection of much smaller files that are collectively 6 GB? Assuming you have the disk space, what if the information your operators are extracting was just sitting in directories? That may not be a solution, but if your system could function that way, then maybe you can think about other ways to approach what you're doing. A database would be another way, except then you'd have to maintain a database... --marmot	[reply]
Re^2: Creating Archives for Files > 4GB by hoffy (Acolyte) on Jul 26, 2010 at 00:06 UTC
Thanks for the reply I am talking about one file of 6GB of data. These files are just straight ASCII characters (Financial institution Statement files), so when using something like 7 zip, they compress down reasonably well. I would use 7 zip, but an over zealous auditor and manager has decreed that Open Source is bad.... (You wouldn't believe how much hassle I had to go through to get PERL approved! I am not even allowed to download modules from the CPAN :-( )	[reply]
Re^3: Creating Archives for Files > 4GB by furry_marmot (Pilgrim) on Jul 26, 2010 at 15:03 UTC
Ye Gods! My condolences. :-( Well...hmmmm. My thoughts all go to breaking your file up into chunks. I mean, on the face of it, you've got a system where your data source drops 6GB files and actually expects someone to use them. But if you can't get Perl installed easily, you're probably not in a position to re-engineer your company's processes ("Excuse me sir, I think I'm smarter than you and...what's that?...yes, I like working here...oh...sorry..."). So, for a chunk-wise example, you can use `sysread()` to read a monster text file (perhaps this one) in smallish chunks (512k, 4MB, whatever). You write a function like `get_next_chunk()` that manages the chunk-reading as needed, finding the start and end of the current "record", as defined by you. Then you write your main function with a `while(get_next_record()){}` loop, and it never has to know about `sysread()` or chunks at all. So now you abstract this a bit further. In some pre-process, you break your data into chunks (size dependent on memory and performance) and zip them separately. Then your `get_next_record()` function uses Archive::Zip or IO::Compress to read and decompress each chunk. It might require a bit of glue in the middle of your current process, but this is where I'd start. I realize I'm talking through my hat here because I don't know anything about the structure of your files or what you're doing with them. Cheers! --marmot Update: Got back from an inspection and clarified my comments.	[reply] [d/l] [select]
Re^4: Creating Archives for Files > 4GB by hoffy (Acolyte) on Jul 28, 2010 at 00:52 UTC