warthurton has asked for the wisdom of the Perl Monks concerning the following question:

I need to take a binary file and split it so that I can send in a smaller chunk.

I'm thinking about justing reading in x bytes, and saving out those bytes. Then I can just read the file back and append each chunk out to recreate the original file.

My question is. Is this the best way to go about it? I've been googling around to see if there was a best way to do this.

Should I read x + 100k for each piece to make sure that it matches up? I'm planning on taking a MD5 of the original and each piece to verify that it is good. I also probably will zip and encrypt each piece. Those parts are easy.

I need this to work on an os where zip or unix utilites will not be available.

Any ideas would be greatly appreciated.

Wayne

Replies are listed 'Best First'.
Re: Splitting a binary file
by graff (Chancellor) on Apr 01, 2005 at 02:38 UTC
    There are various ways to do this:
    • perldoc -f sysread
    • perldoc -f read
    • perldoc perlvar (look for $/, INPUT_RECORD_SEPARATOR)

    Just pick one method and stick with it: loop over the big file to read "x" bytes, open a suitable output, write the chunk, and close it. Don't repeat any data across chunks -- that will mess you up later.

    Since you're talking about "sending" the chunks, I assume they're relatively small, so one chunk fits in memory. (If the chunks are too big, you'd need a nested loop to read "sub-chunks" that do fit, outputting each one to the "main chunk", then closing that when it's the right size, and moving on to the next main chunk.)

    If the chunks are being stored as separate files, putting them back together again is really simple. Either a single shell command:

    cat fatset.chunk* > fatset
    or in perl:
    { local $/; # sets input_record_separator to "undef" open(O,">fatset"); for my $f (sort <fatset.chunk*>) { open(I,$f); $_ = <I>; # reads entire chunk at once. print O; } close O; }
    Just be sure that if you need to make more than 9 chunks, use leading zeros in the file names for chunks 1-9, so that the names will sort naturally into their proper order. Use  sprintf( 'fatset.chunk%03d', $chunknumber++ ) for that.

    Naturally, if you use the perl version, add some error checking...

Re: Splitting a binary file
by cazz (Pilgrim) on Mar 31, 2005 at 22:46 UTC
    Why reinvent the wheel? There are a ton implemented already. binary splitter
      I was hoping to do it in perl. It's part of a larger project.
Re: Splitting a binary file
by BrowserUk (Patriarch) on Apr 01, 2005 at 04:48 UTC

    I tore out some other stuff and added a (little) error checking to these I wrote a while back. Much more could be done to them, but they served my purpose and may be of use to you.

    splitf.pl

    joinf.pl


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?