Parse File With Sub While Loops

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parse File With Sub While Loops by Bird (Pilgrim) on Sep 16, 2002 at 22:00 UTC
You may be able to do this very easily with the range operator. `while (<MYFILE>) { if (/^FH$/ .. /^FT$/) { # $_ is a line within FH and FT lines # including the delimiting lines } if (/^BH$/ ... /^BH$\|^FT$/) { # $_ is a line between BH lines, or between # a BH and FT line }` [download] Basically, the first if is true only if we've already found a line containing only FH, but haven't yet found a line containing FT. The second if is true when we've found a BH line, but haven't yet found either another BH line or an FT line. Hope this helps, -- Bird Oh, the reason the second if uses three dots (...) is because the two dot version can become false in the same check that it became true. Essentially, if you use the two dot version to match a block which uses the same start and end delimiter, you may only end up processing the first line of the block (which would be the delimiter, in this case).	[reply] [d/l]
Re: Re: Parse File With Sub While Loops by bronto (Priest) on Sep 17, 2002 at 10:46 UTC
Oh, my God! Bird, your node shows me the light on the range operator, which I didn't know in it's full power! I think that the information you linked is worth to be read immediately, so I paste it here: In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each ".." operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated. It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once. If you don't want it to test the right operand till the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does. The right operand is not evaluated while the operator is in the "false" state, and the left operand is not evaluated while the operator is in the "true" state. The precedence is a little lower than \|\| and &&. The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1. If either operand of scalar ".." is a constant expression, that operand is implicitly compared to the $. variable, the current line number. Ciao! `--bronto` # Another Perl edition of a song: # The End, by The Beatles END { $you->take($love) eq $you->made($love) ; }	[reply]
Re: Parse File With Sub While Loops by fsn (Friar) on Sep 16, 2002 at 21:51 UTC
You give too little information for me to make a more elaborate suggestion, but I'll try to give you a generic one. When processing a file like this, I try to make just one while loop, and then have some kind of statemachine-ish construction to do the actual work for me. The important thing is to avoid reading from the file in more than one place, like in the main loop and then a sub loop that exhausts some data, since that always seems to give me problems where I must reinsert data back into the buffer in some way, or handle special cases. So, this is my general design principle (in perl-ish pseudocode): `my $state; open SESAME, "infile"; while (<SESAME>) { # Setting the "states" of the "state machine" if ( $_ =~ /FH/) { $state = "FH" } if ( $_ =~ /BH/) { $state = "BH"} . . # Do different things with the data depending on the # settings of the "state machine" if ( $state eq "FH" ) { # do this } if ( $state eq "BH" ) { # do that } . . } close SESAME;` [download] No idea if this helps you.	[reply] [d/l]
Re: Parse File With Sub While Loops by dug (Chaplain) on Sep 16, 2002 at 23:03 UTC
In the TIMTOWDI spirit, here is one that uses some sugar cooked up by thedamian. Be forewarned, it makes some assumtions about your file format that may not be true. #!/usr/bin/perl -w use strict; $\|++; ## # NOTE: This code Assumes (and we all know what that means) that the +file # being fed to has no more than one "boundary" (/FH\|BH/) per line, and + # that the file is delimited by newlines. # use Switch 'Perl6'; # Import thedamian's sugar, it's better than C&H. use English '-no_match_vars'; # since we're using some Perl 6 syntax h +ere, may # as well get rid of $0 in the usage sta +tement my $file = shift or die "USAGE $PROGRAM_NAME filename\n"; open( FH, $file ) or die "Coudln't open $file: $!\n"; my $batchnum = 0; # global batch tracker while (<FH>) { chomp(); next if m/^$/; given ($_) { when /^FH$/ { print "File Header\n"; last; } when /^BH$/ { print "Batch Header\n"; $batchnum++; last; } when /^FT$/ { print "File Trailer\n"; last; } when /./ { handle_batch_content($_); } } } sub handle_batch_content { my $batch_content = shift; print "Got $batch_content in $batchnum\n"; # or whatever else you wa +nt to do } [download] Given the example file you provided, assuming that Example* is newline delimited, this script produces: File Header Batch Header Got 1234123 in 1 Got 1234123 in 1 Batch Header Got 1234963 in 2 Got 1234963 in 2 Got 1234963 in 2 Batch Header Got 1234999 in 3 Got 1234999 in 3 Got 1234999 in 3 Got 1234999 in 3 Got 1234999 in 3 File Trailer HTH, dug	[reply] [d/l]
Re: Parse File With Sub While Loops by anithri (Beadle) on Sep 16, 2002 at 23:51 UTC
More TMTOWTDI... Assuming your BH is predictable and has a static component... FileStart Batch: 123 234 1235613246 1434312 12521 124215 Batch: 133 614 1641 32463 142351 123 Batch: 358 214 125 612 FileEnd Then you could set you Inpute Record Seperator to $/="Batch" and get each batch as a chunk, then process each chunk individually. `open IN, "somefile.txt"; $fileheadinfo = <IN>; $/ = "Batch"; while ($batch = <IN>) { next if $batch = "Batch"; #first line @lines = split /\n/,$batch; $batchinfo = "Batch" . shift @lines; #get batch info pop @lines; #get rid of bar Batch at end foreach $line (@lines) { process($line) } }` [download]	[reply] [d/l]
Re: Parse File With Sub While Loops by Aristotle (Chancellor) on Sep 17, 2002 at 10:30 UTC
Maybe you are looking for Inline::Files? Makeshifts last the longest.	[reply]