comment on

All,
This really isn't cute but is more of a pattern I use quite frequently. It is simple and obvious and yet I see some people stumble with it so I thought I would share.

The general problem is a file where each line is comprised of identifiable fields but a "record" spans multiple lines. There is some key field that has the same value repeated on each line of the record. The process typically starts with externally sorting the file so that all lines of the record are adjacent.

Then it is simply a matter of pushing all the matching into an array and then processing the array as soon as all the lines for that record have been read.

#!/usr/bin/perl
use strict;
use warnings;

my $file = $ARGV[0] or die "Usage: $0 <input_file>";
open(my $fh, '<', $file) or die "Unable to open '$file' for reading: $
+!";

my ($curr_key, @rec) = ('', ());
while (<$fh>) {
    chomp;
    my $entry = parse_line($_);
    if ($entry->{key} ne $curr_key) {
        process_rec($curr_key, \@rec);
        ($curr_key, @rec) = ($entry->{key}, $entry);
    }
    else {
        push @rec, $entry;
    }
}
process_rec($curr_key, \@rec);

sub parse_line {
    my ($line) = @_;
    my %entry;
    # ...
    return \%entry;
}

sub process_rec {
    my ($key, $rec) = @_;
    return if ! @$rec;
    # ...
}
[download]

Of course, parse_line() is usually overkill because the line is delimited in such a way that split is sufficient. Here are some things that may not be so obvious:

The code will not do the right thing if you pass an input file called '0'
The return if ! @$rec; is used to handle when it is called for the first line in the file
The call to process_rec() at the end of the while loop is necessary for the last record

Cheers - L~R

In reply to Process Records Spread Across Multiple Lines by Limbic~Region

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.