Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I have, once again, a big text file which is mostly irrelevant except for the Interesting Bit. Say for instance it's
Line 1 blah blah blah Line 2 blah blah blah ... Line 1000 blah blah blah Line 1001 --start foo-- [lots of stuff I do need to process] Line 2000 --end foo--

So my probably not-very-programmerly instinct is to just do this:

my $interesting_bit = 0; while(<FILE>){ if(/--end foo--/){ $interesting_bit = 0; last; ## assuming only one interesting ## bit per file of course } if($interesting_bit == 1){ ## do my processing on the lines } if(/--start foo--/){ $interesting_bit = 1; # we've found the line which says # the next line needs to be processed } }

It works for me, but is that bad practice?

Doing it this way seems quick and straightforward once you've got the order sorted out, but rather clunky -- what do other monks think?



($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: Parsing Files for the Interesting Bit
by borisz (Canon) on Jul 05, 2004 at 00:24 UTC
    If it works, why change it. I whould do it with the .. operator.
    while(<FILE>){ if ( /^-- start foo/ .. /^-- end foo/ ){ # do what you like to do here } }
    Boris
Re: Parsing Files for the Interesting Bit
by Zaxo (Archbishop) on Jul 05, 2004 at 00:46 UTC

    Your markers look like they are fixed strings, so

    if( $_ eq "--end foo--\n"){ # . . . }
    seems like a better test. Leave off "\n" if you chomp it.

    A couple more approaches come to mind. If the file is not too big, you can abuse $/ to get the content in two gulps.

    my $interesting_bit; { # maybe open here . . . local $/ = '--start foo--' . "\n"; $interesting_bit = <FILE>; local $/ = '--end foo--' . "\n"; $interesting_bit = <FILE>; # . . . and close here }
    That could burden memory.

    A third way is similar to yours, but uses the flip-flop operator to condense the code.

    my $interesting_bit; { # maybe open here . . . local $_; while (<FILE>) { if ($_ eq "--start foo--\n" .. $_ eq "--end foo--\n" ) { $interesting_bit .= $_; } elsif ( $interesting_bit ) { last } } # . . . and close }
    This makes the same assumption about there being only one interesting bit. If the variable is populated while the flip-flop is false, we've run past the end marker and can quit reading.

    I just noticed that I've changed the $interesting_bit variable from a flag in your code to an accumulator for the content. If you just set it true in the lhs of the flip-flop and do your processing in place of my .= operation, all will be well.

    After Compline,
    Zaxo

Re: Parsing Files for the Interesting Bit
by dws (Chancellor) on Jul 05, 2004 at 01:03 UTC

    I might do something like:

    while ( <FILE> ) { if ( /--- start foo ---/ ) { # we're about to start processing while ( <FILE> ) { last if /--- end foo ---/; # here's a line of stuff to process } # we're done with this block } }

    This approach eliminates the conditional, at the risk of not noticing that a file has ended mid block. If that's liable to be a problem, it's a straighforward mod to catch it.

Re: Parsing Files for the Interesting Bit
by Cody Pendant (Prior) on Jul 06, 2004 at 21:12 UTC
    Thank you all for your help.

    I just wanted to note for posterity that my system works for the situation I was in, namely the "interesting bit start" means "interesting bit starts next line".

    The
    if(/start interesting/ .. /end interesting/){ print; }
    version prints the "start interesting" lines themselves, whereas mine starts printing with the line after "start interesting" and stops before printing "end interesting".


    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
    =~y~b-v~a-z~s; print