in reply to Extracting blocks of text

Here is a easy solution:
#!/usr/bin/perl use strict; my $inputfile = shift; my $withinBlock = 0; open (IN, "<$inputfile") || die "could not open $inputfile\n"; while (<IN>) { if (/head/) { $withinBlock = 1; print $_; if (/tail/) { $withinBlock = 0; print "\n"; } } if ($withinBlock) { print $_; if (/tail/) { $withinBlock = 0; print "\n"; } } } close (IN);
I run it with file
bla head gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla tail gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus head bla bla gugus gugus tail bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus head bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus tail gugus gugus
and it showed
bla head gugus gugus bla bla gugus gugus bla head gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla tail gugus bla bla gugus gugus bla bla gugus head bla bla gugus gugus tail bla bla gugus gugus bla bla gugus gugus head bla bla gugus gugus bla bla gugus gugus head bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus bla bla gugus gugus tail gugus gugus
it does not work properly if after a tail there is a head on the same line ...
pelagic

Replies are listed 'Best First'.
Re: Re: Extracting blocks of text
by walker (Initiate) on Feb 01, 2004 at 03:40 UTC
    This one worked GREAT !!! I need to print 5 lines after the "tail" key word...and I don't understand why are there's 2 tests for tail and 2 print commands ?
      I need to print 5 lines after the "tail" key word...

      Why didn't you say so in the first place? That would change how people answer the question.

      and I don't understand why are there's 2 tests for tail and 2 print commands ?

      Well, actually, there's no need for the duplication. The following would work just as well -- and would cover your little "amendment" to the original spec:

      #!/usr/bin/perl use strict; my $inputfile = shift; my $withinBlock = 0; open (IN, "<$inputfile") || die "could not open inputfile\n"; while (<IN>) { if (/head/) { $withinBlock = 6; } if ($withinBlock) { print $_; $withingBlock-- unless $withinBlock == 6; } if (/tail/) { $withinBlock = 5; } } close (IN);
      Note that if there is a new "head" line within the five lines that follow a "tail", the $withinblock state variable gets reset to 6, and will stay there till the next "tail". If there is no "head" within the next five lines, it will decrement to 0, turning off the output.

      Another "feature" of this version is that if there is a "tail" line without a previous "head", the five lines following "tail" will still get printed. One more thing: since the head and tail regexes are not anchored, the logic will fire whenever these words happen to show up in the data -- e.g:

      blah blah head This is a bunch of text in a target block. It includes excerpts from a book on animals, which have tails. So this line will cause the output to be turned off after the next five lines, i.e. here. So you won't get to see this line or this one. tail But you'll see this one and these lines too. Now the output is off again, but since we're taking about animals, which all have heads, the output is now on again, and you see the previous and current lines, as well as this and the next two...
        My most sincere apologies for the revised requirement. The new/expanded goal was discovered after I applied your solution.
        Also, thanks for all the extra explanation. It really helps clear things up.