in reply to Re: Creating Archives for Files > 4GB
in thread Creating Archives for Files > 4GB

Thanks for the reply

I am talking about one file of 6GB of data. These files are just straight ASCII characters (Financial institution Statement files), so when using something like 7 zip, they compress down reasonably well. I would use 7 zip, but an over zealous auditor and manager has decreed that Open Source is bad.... (You wouldn't believe how much hassle I had to go through to get PERL approved! I am not even allowed to download modules from the CPAN :-( )

  • Comment on Re^2: Creating Archives for Files > 4GB

Replies are listed 'Best First'.
Re^3: Creating Archives for Files > 4GB
by furry_marmot (Pilgrim) on Jul 26, 2010 at 15:03 UTC

    Ye Gods! My condolences. :-(

    Well...hmmmm. My thoughts all go to breaking your file up into chunks. I mean, on the face of it, you've got a system where your data source drops 6GB files and actually expects someone to use them. But if you can't get Perl installed easily, you're probably not in a position to re-engineer your company's processes ("Excuse me sir, I think I'm smarter than you and...what's that?...yes, I like working here...oh...sorry...").

    So, for a chunk-wise example, you can use sysread() to read a monster text file (perhaps this one) in smallish chunks (512k, 4MB, whatever). You write a function like get_next_chunk() that manages the chunk-reading as needed, finding the start and end of the current "record", as defined by you. Then you write your main function with a while(get_next_record()){} loop, and it never has to know about sysread() or chunks at all.

    So now you abstract this a bit further. In some pre-process, you break your data into chunks (size dependent on memory and performance) and zip them separately. Then your get_next_record() function uses Archive::Zip or IO::Compress to read and decompress each chunk.

    It might require a bit of glue in the middle of your current process, but this is where I'd start. I realize I'm talking through my hat here because I don't know anything about the structure of your files or what you're doing with them.

    Cheers!
    --marmot

    Update: Got back from an inspection and clarified my comments.

      Again, thanks for your input there marmot

      In the end, though, the simpliest options are often the best. IO::Compress::Zip is probably the easiest to use for what I need. It appears that my intended audience have the right tools to deal with Zip64 files(always a good thing to do the research in the first place, instead of heading into tangent land).

      So, on that note, I am assuming that this is case closed! Thanks to everyone who gave me their input!

      hoffy