Ever get a gigantic data file that you needed to chop into small pieces for processing, because the next tool you were passing it to just couldn't handle that much at once? Or maybe a server log that somebody forgot to rotate? Here's the easy solution: pass this one-liner the files you need to subdivide (e.g. error_log), and you'll get out bite-sized files labeled error_log.chunk01, error_log.chunk02 and so forth.

Note that a bite is defined in this case to be 10,000 lines: if you want bigger bites, you can always change that number. Also note that bite #100 will be named error_log.chunl00, if you get that high (which you probably don't want to anyway).

perl -pe '$chnk = "chunk01" if 1 == $.; open (STDOUT, ">", $ARGV . "." + . $chnk++) unless ($. - 1) % 10_000; close ARGV if eof' big_file_on +e big_file_two

Replies are listed 'Best First'.
Re: Chunk large data/log files into more manageable pieces (split(1))
by grinder (Bishop) on Jun 19, 2003 at 19:13 UTC

    In such circumstances, I'd probably use split(1) (i.e. the Unix tool, not the Perl function).

    split -l 10000 big_file to split into 10000-record files.

    split -b 10000 big_file to split into 10000-byte files.

    It would probably be faster too. (But ++ all the same (thinking of Windows)).

    _____________________________________________
    Come to YAPC::Europe 2003 in Paris, 23-25 July 2003.