Re: Creating Archives for Files > 4GB
by Corion (Patriarch) on Jul 23, 2010 at 06:40 UTC
|
I think this limit is inherent to the .zip file format. Zip (File Format) says that the format only allocates 4 bytes for the uncompressed file size.
You will have to find another format to store your compressed files.
| [reply] [d/l] |
Re: Creating Archives for Files > 4GB
by desemondo (Hermit) on Jul 23, 2010 at 12:55 UTC
|
Take a look at IO::Compress::Zip
It supports the (fairly) new Zip64 standard which handles files much larger than the current zip library standard.
Bear in mind though, that unless your using Windows7 or a recent *nix you'll need to use a 3rd party app to extract the data later, such as Info Zip 6.x or later, (as i don't think IO::Uncompress::Unzip supports reading Zip64 files just yet hopefully its coming soon).
IO::Uncompress::Unzip does support Zip64, thanks pmqs !
| [reply] |
|
|
Yes, IO::Compress::Zip supports Zip64 (which allows zip members to be larger than 4Gig). Here is an example of how to use it
use IO::Compress::Zip qw(:all);
zip [ "bigfile1", "bigfile2"] => "myzip.zip",
Zip64 => 1
or die "Cannot create zip file: $ZipError\n";
IO::Uncompress::Unzip supports Zip64 as well. | [reply] [d/l] |
Re: Creating Archives for Files > 4GB
by Khen1950fx (Canon) on Jul 23, 2010 at 12:22 UTC
|
Archive::Any::Create could make things a lot easier for you. It'll create an archive either in tar.gz or zip format. I'd go with the tar.gz. Here's an example:
#!/usr/bin/perl
use strict;
use warnings;
use Archive::Any::Create;
my $archive = Archive::Any::Create->new;
$archive->container('myStuff');#top-level dir with your files
$archive->add_file('stuff.pl', 'perl script');
$archive->add_file('morestuff.pl', 'perl script');
$archive->write_file('myStuff.tar.gz');
#or
#$archive->write_file('myStuff.zip');
It can knock off a 6 GB tarball like it was 6 K.
| [reply] [d/l] |
|
|
True, however gzip doesn't support/provide random access to individual files within the gzipped archive. The entire archive must be unzipped first, even if you only want to access 1 file...
For some tasks/processes that is a BIG difference... for others, it probably doesn't matter much...
| [reply] |
Re: Creating Archives for Files > 4GB
by aquarium (Curate) on Jul 23, 2010 at 07:20 UTC
|
| [reply] |
Re: Creating Archives for Files > 4GB
by furry_marmot (Pilgrim) on Jul 24, 2010 at 04:47 UTC
|
What kind of data is it? In particular, are you compressing 6 GB files into zip archives? Or do you have a collection of much smaller files that are collectively 6 GB?
Assuming you have the disk space, what if the information your operators are extracting was just sitting in directories? That may not be a solution, but if your system could function that way, then maybe you can think about other ways to approach what you're doing. A database would be another way, except then you'd have to maintain a database...
--marmot
| [reply] |
|
|
Thanks for the reply
I am talking about one file of 6GB of data. These files are just straight ASCII characters (Financial institution Statement files), so when using something like 7 zip, they compress down reasonably well. I would use 7 zip, but an over zealous auditor and manager has decreed that Open Source is bad.... (You wouldn't believe how much hassle I had to go through to get PERL approved! I am not even allowed to download modules from the CPAN :-( )
| [reply] |
|
|
Ye Gods! My condolences. :-(
Well...hmmmm. My thoughts all go to breaking your file up into chunks. I mean, on the face of it, you've got a system where your data source drops 6GB files and actually expects someone to use them. But if you can't get Perl installed easily, you're probably not in a position to re-engineer your company's processes ("Excuse me sir, I think I'm smarter than you and...what's that?...yes, I like working here...oh...sorry...").
So, for a chunk-wise example, you can use sysread() to read a monster text file (perhaps this one) in smallish chunks (512k, 4MB, whatever). You write a function like get_next_chunk() that manages the chunk-reading as needed, finding the start and end of the current "record", as defined by you. Then you write your main function with a while(get_next_record()){} loop, and it never has to know about sysread() or chunks at all.
So now you abstract this a bit further. In some pre-process, you break your data into chunks (size dependent on memory and performance) and zip them separately. Then your get_next_record() function uses Archive::Zip or IO::Compress to read and decompress each chunk.
It might require a bit of glue in the middle of your current process, but this is where I'd start. I realize I'm talking through my hat here because I don't know anything about the structure of your files or what you're doing with them.
Cheers!
--marmot
Update: Got back from an inspection and clarified my comments.
| [reply] [d/l] [select] |
|
|