Re: Help spliting file into chunks

The logic you need is, in pseudo code

while there's a next line
    extract the ID
    check if ID matches the previous ID
    if not:
       set previous ID to current ID
       increment a counter
       if counter > 100
            close current output file
            open a new output file
            reset counter
    
    write the line to the current output file.
[download]

Try to write that in perl you can almost translate it directly. If you have troubles with a specific step, shows us what you've tried and where your problem is.

Comment on Re: Help spliting file into chunks Download Code

Replies are listed 'Best First'.
Re^2: Help spliting file into chunks by ikegami (Patriarch) on Jul 28, 2009 at 16:03 UTC
That can create empty files. Fix: `while there's a next line extract the ID check if ID matches the previous ID if not: set previous ID to current ID increment a counter if counter > 100 close current output file reset counter if output file isn't open open a new output file write the line to the current output file.` [download] It can also create files with more than 100 records. Fix: `my $last_id; my @group; my $fh; my $line_counter = 0; my $file_counter = 0; sub output { if ($line_counter + @group > 100) { $fh = undef; $line_counter = 0; } if (!defined($fh)) { my $fn = sprintf('file%04d', $file_counter++); open($fh, '>', $fn) or die("Error create file $fn: $!\n"); } $line_counter += @group; print($fh splice(@group)); } while (<>) { my ($id) = /^(\S+)/; $last_id = $id if !defined($last_id); if ($id eq $last_id) { push @group, $_; } else { output(); } } output() if @group;` [download] If there's more than 100 record for one id, it'll put them in the same file despite the limit. Note that both my code and the parent's pseudocode assume that the records are grouped by id in the input file.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Help spliting file into chunks
by ikegami (Patriarch) on Jul 28, 2009 at 16:03 UTC

That can create empty files. Fix:

while there's a next line
    extract the ID
    check if ID matches the previous ID
    if not:
        set previous ID to current ID
        increment a counter
        if counter > 100
            close current output file
            reset counter
    
    if output file isn't open
        open a new output file

    write the line to the current output file.
[download]

It can also create files with more than 100 records. Fix:

my $last_id;
my @group;
my $fh;
my $line_counter = 0;
my $file_counter = 0;

sub output {
    if ($line_counter + @group > 100) {
        $fh = undef;
        $line_counter = 0;            
    }

    if (!defined($fh)) {
        my $fn = sprintf('file%04d', $file_counter++);
        open($fh, '>', $fn)
            or die("Error create file $fn: $!\n");
    }

    $line_counter += @group;
    print($fh splice(@group));
}

while (<>) {
    my ($id) = /^(\S+)/;
    $last_id = $id if !defined($last_id);

    if ($id eq $last_id) {
        push @group, $_;
    } else {
        output();
    }
}

output() if @group;
[download]

If there's more than 100 record for one id, it'll put them in the same file despite the limit.

Note that both my code and the parent's pseudocode assume that the records are grouped by id in the input file.

[reply]
[d/l]
[select]