Re: Creating Archives for Files

What kind of data is it? In particular, are you compressing 6 GB files into zip archives? Or do you have a collection of much smaller files that are collectively 6 GB?

Assuming you have the disk space, what if the information your operators are extracting was just sitting in directories? That may not be a solution, but if your system could function that way, then maybe you can think about other ways to approach what you're doing. A database would be another way, except then you'd have to maintain a database...

--marmot

Comment on Re: Creating Archives for Files > 4GB

Replies are listed 'Best First'.
Re^2: Creating Archives for Files > 4GB by hoffy (Acolyte) on Jul 26, 2010 at 00:06 UTC
Thanks for the reply I am talking about one file of 6GB of data. These files are just straight ASCII characters (Financial institution Statement files), so when using something like 7 zip, they compress down reasonably well. I would use 7 zip, but an over zealous auditor and manager has decreed that Open Source is bad.... (You wouldn't believe how much hassle I had to go through to get PERL approved! I am not even allowed to download modules from the CPAN :-( )	[reply]
Re^3: Creating Archives for Files > 4GB by furry_marmot (Pilgrim) on Jul 26, 2010 at 15:03 UTC
Ye Gods! My condolences. :-( Well...hmmmm. My thoughts all go to breaking your file up into chunks. I mean, on the face of it, you've got a system where your data source drops 6GB files and actually expects someone to use them. But if you can't get Perl installed easily, you're probably not in a position to re-engineer your company's processes ("Excuse me sir, I think I'm smarter than you and...what's that?...yes, I like working here...oh...sorry..."). So, for a chunk-wise example, you can use `sysread()` to read a monster text file (perhaps this one) in smallish chunks (512k, 4MB, whatever). You write a function like `get_next_chunk()` that manages the chunk-reading as needed, finding the start and end of the current "record", as defined by you. Then you write your main function with a `while(get_next_record()){}` loop, and it never has to know about `sysread()` or chunks at all. So now you abstract this a bit further. In some pre-process, you break your data into chunks (size dependent on memory and performance) and zip them separately. Then your `get_next_record()` function uses Archive::Zip or IO::Compress to read and decompress each chunk. It might require a bit of glue in the middle of your current process, but this is where I'd start. I realize I'm talking through my hat here because I don't know anything about the structure of your files or what you're doing with them. Cheers! --marmot Update: Got back from an inspection and clarified my comments.	[reply] [d/l] [select]
Re^4: Creating Archives for Files > 4GB by hoffy (Acolyte) on Jul 28, 2010 at 00:52 UTC
Again, thanks for your input there marmot In the end, though, the simpliest options are often the best. IO::Compress::Zip is probably the easiest to use for what I need. It appears that my intended audience have the right tools to deal with Zip64 files(always a good thing to do the research in the first place, instead of heading into tangent land). So, on that note, I am assuming that this is case closed! Thanks to everyone who gave me their input! hoffy	[reply]