To propose a really fitting solution, we need to see some code.
How do you extract the title? Do you read it in a separate run through the file? Do you have a loop that does one of several things depending on what the current line starts with? What else is your code doing? The solution will differ depending on your existing implementation.
I am guessing that: the information is all located in a single file, you're only doing one iteration over it, and all pieces of information follow the format you already showed (ie if broken across multiple lines, the following lines start with the same tag followed by a line number).
In that case, the way I'd handle this is to read the lines batchwise, reconstruct them into a single line, then hand it off to the appropriate handler.
my %handler = (
HEADER => sub { ... },
TITLE => sub { ... },
COMPND => sub { ... },
);
my ($tag, $text) = ("")x2;
while(<>) {
chomp;
my ($curr_tag, $curr_text) = split /\s+/, $_, 2;
if($curr_tag ne $prev_tag) {
$handler{$tag}->($tag, $text) if exists $handler{$tag};
# complain_about_unknown() if not exists $handler{$tag}; ?
($tag, $text) = ($curr_tag, "");
}
else {
my $curr_linenr;
($curr_linenr, $curr_text) = split /\s+/, $curr_text, 2;
# perform validation on line nr here?
}
$text .= " " . $curr_text;
}
So now we have a parser that lets us write handlers for the tags that don't individually need to worry about multiple line text. And then the distinction is painless:
my %record;
my %handler = (
# ...
TITLE => sub { $record{TITLE} = $_[1] unless exists $record{TITLE}
+ },
# ...
);
Or if there are multiple records per file:
my $curr_rec = 0;
my @record;
my %handler = (
# ...
HEADER => sub { ++$curr_rec },
TITLE => sub {
$record[$curr_rec]->{TITLE} = $_[1]
unless exists $record[$curr_rec]->{TITLE}
},
# ...
);
You get the idea.
Makeshifts last the longest.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.