in reply to File splitting help
Either way you're going to have to read chunks of data and write chunks of data.
Suggestion, then: use read or sysread to pull in large chunks of the file at a time, say 64 kilobytes, into a buffer. Keep a counter of what position you are in the file. Write that buffer to your chunk file, however if the counter exceeds your chunk length (e.g. 400MB) then scan backwards for the last newline character using rindex. Flush the initial portion of that buffer, then close your file, and flush the last portion of that buffer to a new chunk file, reset your chunk length counter, and continue.
Some psuedo-code (this is _not_ Perl):
while ( ! eof ) { chunklen = 0; chunknum = 0; open( FOUT, ">chunk" . chunknum++ ); # read into buffer, but at end of buffer in case of leftovers while ( len = read( FIN, buffer, 64000, length(buffer) ) ) { if ( chunklen + len > 400MB ) { # got to end of chunk, deal with newline lastnewline = rindex( buffer, "\n" ); if ( lastnewline ) { # flush up to last found newline write( FOUT, substr( buffer, 0, lastnewline ) ); substr( buffer, 0, lastnewline ) = ""; close( FOUT ); last; # skip to next file } else { # flush entire buffer (no newline found) write( FOUT, buffer ); buffer = ""; } } else { # not at end of chunk, just write buffer print( FOUT, buffer ); buffer = ""; chunklen += len; } } # while we've got something to read } # while not at eof of input
Update: had to ensure read was to end of buffer, close chunk file when done
|
---|