Parsing Files for the Interesting Bit

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I have, once again, a big text file which is mostly irrelevant except for the Interesting Bit. Say for instance it's

Line 1 blah blah blah
Line 2 blah blah blah
...
Line 1000 blah blah blah
Line 1001 --start foo--
[lots of stuff I do need to process]
Line 2000 --end foo--
[download]

So my probably not-very-programmerly instinct is to just do this:


my $interesting_bit = 0;

while(<FILE>){

  if(/--end foo--/){
    $interesting_bit = 0;
    last; ## assuming only one interesting
          ## bit per file of course
  }

  if($interesting_bit == 1){
    ## do my processing on the lines
  }

  if(/--start foo--/){
    $interesting_bit = 1;
    # we've found the line which says
    # the next line needs to be processed
  }



}
[download]

It works for me, but is that bad practice?

Doing it this way seems quick and straightforward once you've got the order sorted out, but rather clunky -- what do other monks think?

($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print

Comment on Parsing Files for the Interesting Bit Select or Download Code

Replies are listed 'Best First'.
Re: Parsing Files for the Interesting Bit by borisz (Canon) on Jul 05, 2004 at 00:24 UTC
If it works, why change it. I whould do it with the .. operator. `while(<FILE>){ if ( /^-- start foo/ .. /^-- end foo/ ){ # do what you like to do here } }` [download] Boris	[reply] [d/l]
Re: Parsing Files for the Interesting Bit by Zaxo (Archbishop) on Jul 05, 2004 at 00:46 UTC
Your markers look like they are fixed strings, so `if( $_ eq "--end foo--\n"){ # . . . }` [download] seems like a better test. Leave off "\n" if you chomp it. A couple more approaches come to mind. If the file is not too big, you can abuse $/ to get the content in two gulps. `my $interesting_bit; { # maybe open here . . . local $/ = '--start foo--' . "\n"; $interesting_bit = <FILE>; local $/ = '--end foo--' . "\n"; $interesting_bit = <FILE>; # . . . and close here }` [download] That could burden memory. A third way is similar to yours, but uses the flip-flop operator to condense the code. `my $interesting_bit; { # maybe open here . . . local $_; while (<FILE>) { if ($_ eq "--start foo--\n" .. $_ eq "--end foo--\n" ) { $interesting_bit .= $_; } elsif ( $interesting_bit ) { last } } # . . . and close }` [download] This makes the same assumption about there being only one interesting bit. If the variable is populated while the flip-flop is false, we've run past the end marker and can quit reading. I just noticed that I've changed the $interesting_bit variable from a flag in your code to an accumulator for the content. If you just set it true in the lhs of the flip-flop and do your processing in place of my `.=` operation, all will be well. After Compline, Zaxo	[reply] [d/l] [select]
Re: Parsing Files for the Interesting Bit by dws (Chancellor) on Jul 05, 2004 at 01:03 UTC
I might do something like: `while ( <FILE> ) { if ( /--- start foo ---/ ) { # we're about to start processing while ( <FILE> ) { last if /--- end foo ---/; # here's a line of stuff to process } # we're done with this block } }` [download] This approach eliminates the conditional, at the risk of not noticing that a file has ended mid block. If that's liable to be a problem, it's a straighforward mod to catch it.	[reply] [d/l]
Re: Parsing Files for the Interesting Bit by Cody Pendant (Prior) on Jul 06, 2004 at 21:12 UTC
Thank you all for your help. I just wanted to note for posterity that my system works for the situation I was in, namely the "interesting bit start" means "interesting bit starts next line". The `if(/start interesting/ .. /end interesting/){ print; }` [download] version prints the "start interesting" lines themselves, whereas mine starts printing with the line after "start interesting" and stops before printing "end interesting". ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print	[reply] [d/l]