Just reading them and then throwing them away doesn't look too perlish.
The data file is stored on disk as just a linear sequence of bytes.
The disk/file system doesn't know anything about "skip the first 2 lines" or "skip
blank lines" or "skip delimiter lines". All it knows about is reading bytes
sequentially from the file.
So one way or the other, the header lines and \n indicating blank lines and the delimiters
have to be read from the disk. It is not possible to "not read the delimiter lines".
Somebody has to decide which lines to "throw away" and
that somebody is the user. The only question is what kind of technique and/or Perl
module that the user wants to use. There isn't a single "right" answer. That's why
you got a couple of responses with different ways. There are some, what I consider
"less good" ways which weren't offered as possibilities.
The basic job is to decide whether you are inside the record or not? This means
that there has to be some state information to know that a new record has started and
when that record has ended.
One way is like I showed Re: Parsing text sections, call a subroutine when the record starts
and have that subroutine finish reading the record. The fact that you are in the
subroutine means that a record has started. A flag like "INSIDE_RECORD?", true/false
is not needed as it is implicit by the fact of being in the "finish the record"
subroutine. This is a common coding pattern for this task and would be seen in
other languages like C. I didn't show the code for calling the sub-parser, but obviously you would call that based upon what I called the "header" (the record type info from @ line).
BTW, it wasn't needed here, but if what "ends the record" is
the start of a new record, instead of "unreading" that line in various ways,
another way is to set a "noread" flag: while ($noread && ($line=<IN>)). This keeps
$line for another iteration of the loop. If you are
designing the format, avoiding this "start of new record means end of previous record"
saves grief. In this particular case having records separated only by an "----@ type" line would have made the record parsing more problematic.
You should note that regexes in Perl can be variables!! This is way cool and applicable
to all techniques.
The second way is to use flags to indicate whether or not you are inside the record.
You can do the logic for this yourself which I would consider a "not as good" way. Or
as Grandfather did, use the triple dot, or "flip-flop" operator. Read his node
about it: Flipin good, or a total flop?. Read the other posts on how to exclude the lines that trigger
the record in various ways.
This very special Perl operator essentially sets up flags for you to keep track of
where you are. This is a cool critter and it takes some experimentation to understand it. If you read carefully the above, you will see that it also keeps track of the line number within the record which can sometimes be very helpful.
So this was a long post to say: Yes, all the lines have to be read from the file and the "bad ones" thrown away. This node shows 2 ways to do that, one of which is very Perl specific. Which way you prefer is up to you and often depends upon hard to quantify factors like who is going to be maintaining this code?, etc. |