Chunk large data/log files into more manageable pieces

Ever get a gigantic data file that you needed to chop into small pieces for processing, because the next tool you were passing it to just couldn't handle that much at once? Or maybe a server log that somebody forgot to rotate? Here's the easy solution: pass this one-liner the files you need to subdivide (e.g. error_log), and you'll get out bite-sized files labeled error_log.chunk01, error_log.chunk02 and so forth.

Note that a bite is defined in this case to be 10,000 lines: if you want bigger bites, you can always change that number. Also note that bite #100 will be named error_log.chunl00, if you get that high (which you probably don't want to anyway).

perl -pe '$chnk = "chunk01" if 1 == $.; open (STDOUT, ">", $ARGV . "." + . $chnk++) unless ($. - 1) % 10_000; close ARGV if eof' big_file_on +e big_file_two [download] Comment on Chunk large data/log files into more manageable pieces Download Code

Replies are listed 'Best First'.
Re: Chunk large data/log files into more manageable pieces (split(1)) by grinder (Bishop) on Jun 19, 2003 at 19:13 UTC
In such circumstances, I'd probably use `split(1)` (i.e. the Unix tool, not the Perl function). `split -l 10000 big_file` to split into 10000-record files. `split -b 10000 big_file` to split into 10000-byte files. It would probably be faster too. (But ++ all the same (thinking of Windows)). _____________________________________________ Come to YAPC::Europe 2003 in Paris, 23-25 July 2003.	[reply]
Re: Re: Chunk large data/log files into more manageable pieces (split(1)) by EdwardG (Vicar) on Jun 20, 2003 at 06:43 UTC
++ Although Windows users could still use split.exe from http://unxutils.sourceforge.net/	[reply]