in reply to Can I split a 10GB file into 1 GB sizes using my repeating data pattern
The GNU utility split can split on line counts, which seems to be very close to what you want - just divide by the number of chars per line (+1 for the line feed or +2 if using carriage return as well). See the split man page, option -l.
Update: Sorry - missed the criteria that each first line must begin with 100.
Untested and needs tidying:
use strict; use warnings; my $MAX_FILE_SIZE = 10_000_000_000; my $num = 0; my $next_outfile = sub { open my $OUT, '>', 'file_' . (++$num) or die $!; return $OUT; } my $OUTPUT; my $curr_size; my $process_chunk = sub { my $chunk = shift; if(not defined $curr_size or $curr_size + length($chunk) > $MAX_FILE_SIZE) { $OUTPUT = $next_outfile->(); $curr_size = 0; } $curr_size += length($chunk); print $OUTPUT $chunk; }; my $chunk; while(my $line = <INPUT>) { if($line =~ /^100/) { $process_chunk->($chunk); $chunk = ''; } $chunk .= $line; }
Reading guide: code is best understood by starting with the while loop at the bottom.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Can I split a 10GB file into 1 GB sizes using my repeating data pattern
by Limbic~Region (Chancellor) on Jul 22, 2009 at 22:35 UTC |