All,
This really isn't cute but is more of a pattern I use quite frequently. It is simple and obvious and yet I see some people stumble with it so I thought I would share.

The general problem is a file where each line is comprised of identifiable fields but a "record" spans multiple lines. There is some key field that has the same value repeated on each line of the record. The process typically starts with externally sorting the file so that all lines of the record are adjacent.

Then it is simply a matter of pushing all the matching into an array and then processing the array as soon as all the lines for that record have been read.

#!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0] or die "Usage: $0 <input_file>"; open(my $fh, '<', $file) or die "Unable to open '$file' for reading: $ +!"; my ($curr_key, @rec) = ('', ()); while (<$fh>) { chomp; my $entry = parse_line($_); if ($entry->{key} ne $curr_key) { process_rec($curr_key, \@rec); ($curr_key, @rec) = ($entry->{key}, $entry); } else { push @rec, $entry; } } process_rec($curr_key, \@rec); sub parse_line { my ($line) = @_; my %entry; # ... return \%entry; } sub process_rec { my ($key, $rec) = @_; return if ! @$rec; # ... }
Of course, parse_line() is usually overkill because the line is delimited in such a way that split is sufficient. Here are some things that may not be so obvious:

Cheers - L~R

Replies are listed 'Best First'.
Re: Process Records Spread Across Multiple Lines
by ig (Vicar) on Feb 16, 2011 at 09:09 UTC

    What you wrote is easy to read but, for me, the following is easier...

    my @rec; while (<$fh>) { my $entry = parse_line($_); if (@rec and $entry->{key} ne $rec[0]->{key}) { process_rec(\@rec); @rec = (); } push @rec, $entry; } process_rec(\@rec); sub parse_line { my ($line) = @_; chomp($line); my %entry; # ... return \%entry; } sub process_rec { my ($rec) = @_; return if ! @$rec; # ... }