The GNU utility split can split on line counts, which seems to be very close to what you want - just divide by the number of chars per line (+1 for the line feed or +2 if using carriage return as well). See the split man page, option -l.
Update: Sorry - missed the criteria that each first line must begin with 100.
Untested and needs tidying:
use strict; use warnings; my $MAX_FILE_SIZE = 10_000_000_000; my $num = 0; my $next_outfile = sub { open my $OUT, '>', 'file_' . (++$num) or die $!; return $OUT; } my $OUTPUT; my $curr_size; my $process_chunk = sub { my $chunk = shift; if(not defined $curr_size or $curr_size + length($chunk) > $MAX_FILE_SIZE) { $OUTPUT = $next_outfile->(); $curr_size = 0; } $curr_size += length($chunk); print $OUTPUT $chunk; }; my $chunk; while(my $line = <INPUT>) { if($line =~ /^100/) { $process_chunk->($chunk); $chunk = ''; } $chunk .= $line; }
Reading guide: code is best understood by starting with the while loop at the bottom.
In reply to Re: Can I split a 10GB file into 1 GB sizes using my repeating data pattern
by mzedeler
in thread Can I split a 10GB file into 1 GB sizes using my repeating data pattern
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |