while (<MYFILE>) {
if (/^FH$/ .. /^FT$/) {
# $_ is a line within FH and FT lines
# including the delimiting lines
}
if (/^BH$/ ... /^BH$|^FT$/) {
# $_ is a line between BH lines, or between
# a BH and FT line
}
Basically, the first if is true only if we've already found a line containing only FH, but haven't yet found a line containing FT. The second if is true when we've found a BH line, but haven't yet found either another BH line or an FT line.
Hope this helps,
-- Bird
Oh, the reason the second if uses three dots (...) is because the two dot version can become false in the same check that it became true. Essentially, if you use the two dot version to match a block which uses the same start and end delimiter, you may only end up processing the first line of the block (which would be the delimiter, in this case). | [reply] [d/l] |
Oh, my God!
Bird, your node shows me the light on the range operator, which I didn't know in it's full power!
I think that the information you linked is worth to be read immediately, so I paste it here:
In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each ".." operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated. It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once. If you don't want it to test the right operand till the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does.
The right operand is not evaluated while the operator is in the "false" state, and the left operand is not evaluated while the operator is in the "true" state. The precedence is a little lower than || and &&. The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1. If either operand of scalar ".." is a constant expression, that operand is implicitly compared to the $. variable, the current line number.
Ciao! --bronto
# Another Perl edition of a song:
# The End, by The Beatles
END {
$you->take($love) eq $you->made($love) ;
}
| [reply] |
You give too little information for me to make a more elaborate suggestion, but I'll try to give you a generic one.
When processing a file like this, I try to make just one while loop, and then have some kind of statemachine-ish construction to do the actual work for me. The important thing is to avoid reading from the file in more than one place, like in the main loop and then a sub loop that exhausts some data, since that always seems to give me problems where I must reinsert data back into the buffer in some way, or handle special cases.
So, this is my general design principle (in perl-ish pseudocode):
my $state;
open SESAME, "infile";
while (<SESAME>) {
# Setting the "states" of the "state machine"
if ( $_ =~ /FH/) { $state = "FH" }
if ( $_ =~ /BH/) { $state = "BH"}
.
.
# Do different things with the data depending on the
# settings of the "state machine"
if ( $state eq "FH" ) { # do this }
if ( $state eq "BH" ) { # do that }
.
.
}
close SESAME;
No idea if this helps you.
| [reply] [d/l] |
In the TIMTOWDI spirit, here is one that uses some sugar cooked up by thedamian.
Be forewarned, it makes some assumtions about your file format that may not be true.
#!/usr/bin/perl -w
use strict;
$|++;
##
# NOTE: This code Assumes (and we all know what that means) that the
+file
# being fed to has no more than one "boundary" (/FH|BH/) per line, and
+
# that the file is delimited by newlines.
#
use Switch 'Perl6'; # Import thedamian's sugar, it's better than C&H.
use English '-no_match_vars'; # since we're using some Perl 6 syntax h
+ere, may
# as well get rid of $0 in the usage sta
+tement
my $file = shift or
die "USAGE $PROGRAM_NAME filename\n";
open( FH, $file ) or
die "Coudln't open $file: $!\n";
my $batchnum = 0; # global batch tracker
while (<FH>) {
chomp();
next if m/^$/;
given ($_) {
when /^FH$/ { print "File Header\n"; last; }
when /^BH$/ { print "Batch Header\n"; $batchnum++; last; }
when /^FT$/ { print "File Trailer\n"; last; }
when /.*/ { handle_batch_content($_); }
}
}
sub handle_batch_content {
my $batch_content = shift;
print "Got $batch_content in $batchnum\n"; # or whatever else you wa
+nt to do
}
Given the example file you provided, assuming that Example is newline delimited, this script produces:
File Header
Batch Header
Got 1234123 in 1
Got 1234123 in 1
Batch Header
Got 1234963 in 2
Got 1234963 in 2
Got 1234963 in 2
Batch Header
Got 1234999 in 3
Got 1234999 in 3
Got 1234999 in 3
Got 1234999 in 3
Got 1234999 in 3
File Trailer
HTH,
dug | [reply] [d/l] |
More TMTOWTDI...
Assuming your BH is predictable and has a static component...
FileStart
Batch: 123
234
1235613246
1434312
12521
124215
Batch: 133
614
1641
32463
142351
123
Batch: 358
214
125
612
FileEnd
Then you could set you Inpute Record Seperator to $/="Batch"
and get each batch as a chunk, then process each chunk individually.
open IN, "somefile.txt";
$fileheadinfo = <IN>;
$/ = "Batch";
while ($batch = <IN>) {
next if $batch = "Batch"; #first line
@lines = split /\n/,$batch;
$batchinfo = "Batch" . shift @lines; #get batch info
pop @lines; #get rid of bar Batch at end
foreach $line (@lines) {
process($line)
}
}
| [reply] [d/l] |
| [reply] |