comment on

To propose a really fitting solution, we need to see some code.

How do you extract the title? Do you read it in a separate run through the file? Do you have a loop that does one of several things depending on what the current line starts with? What else is your code doing? The solution will differ depending on your existing implementation.

I am guessing that: the information is all located in a single file, you're only doing one iteration over it, and all pieces of information follow the format you already showed (ie if broken across multiple lines, the following lines start with the same tag followed by a line number).

In that case, the way I'd handle this is to read the lines batchwise, reconstruct them into a single line, then hand it off to the appropriate handler.

my %handler = (
    HEADER => sub { ... },
    TITLE  => sub { ... },
    COMPND => sub { ... },
);

my ($tag, $text) = ("")x2;
while(<>) {
    chomp;
    my ($curr_tag, $curr_text) = split /\s+/, $_, 2;
    if($curr_tag ne $prev_tag) {
        $handler{$tag}->($tag, $text) if exists $handler{$tag};
        # complain_about_unknown() if not exists $handler{$tag}; ?
        ($tag, $text) = ($curr_tag, "");
    }
    else {
        my $curr_linenr;
        ($curr_linenr, $curr_text) = split /\s+/, $curr_text, 2;
        # perform validation on line nr here?
    }
    $text .= " " . $curr_text;
}
[download]

So now we have a parser that lets us write handlers for the tags that don't individually need to worry about multiple line text. And then the distinction is painless:

my %record;

my %handler = (
    # ...
    TITLE => sub { $record{TITLE} = $_[1] unless exists $record{TITLE}
+ },
    # ...
);
[download]

Or if there are multiple records per file:

my $curr_rec = 0;
my @record;

my %handler = (
    # ...
    HEADER => sub { ++$curr_rec },
    TITLE  => sub {
        $record[$curr_rec]->{TITLE} = $_[1]
            unless exists $record[$curr_rec]->{TITLE}
    },
    # ...
);
[download]

You get the idea.

Makeshifts last the longest.

In reply to Re: Finding first block of contiguous elements in an array by Aristotle
in thread Finding first block of contiguous elements in an array by FamousLongAgo

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks