in reply to Help spliting file into chunks

The logic you need is, in pseudo code
while there's a next line extract the ID check if ID matches the previous ID if not: set previous ID to current ID increment a counter if counter > 100 close current output file open a new output file reset counter write the line to the current output file.

Try to write that in perl you can almost translate it directly. If you have troubles with a specific step, shows us what you've tried and where your problem is.

Replies are listed 'Best First'.
Re^2: Help spliting file into chunks
by ikegami (Patriarch) on Jul 28, 2009 at 16:03 UTC

    That can create empty files. Fix:

    while there's a next line extract the ID check if ID matches the previous ID if not: set previous ID to current ID increment a counter if counter > 100 close current output file reset counter if output file isn't open open a new output file write the line to the current output file.

    It can also create files with more than 100 records. Fix:

    my $last_id; my @group; my $fh; my $line_counter = 0; my $file_counter = 0; sub output { if ($line_counter + @group > 100) { $fh = undef; $line_counter = 0; } if (!defined($fh)) { my $fn = sprintf('file%04d', $file_counter++); open($fh, '>', $fn) or die("Error create file $fn: $!\n"); } $line_counter += @group; print($fh splice(@group)); } while (<>) { my ($id) = /^(\S+)/; $last_id = $id if !defined($last_id); if ($id eq $last_id) { push @group, $_; } else { output(); } } output() if @group;

    If there's more than 100 record for one id, it'll put them in the same file despite the limit.

    Note that both my code and the parent's pseudocode assume that the records are grouped by id in the input file.