riz has asked for the wisdom of the Perl Monks concerning the following question:

HI,
I am reading a file and interested in only lines between the 2 keywords. The keywords appear several times in the file. The first keyword is a variabe string and the second one is constant. e.g.

mykeyword1
<several data lines>
***END***
mykeyword2
<several lines>
***END***
mykeyword3
<several lines>
***END***
mykeyword4
<again several lines>
***END***
and so on

If I am interested in line set under keyword2 & keyword4, how would I do that?
Thanks,
Saad.

Replies are listed 'Best First'.
Re: searching data lines between keywords
by tlm (Prior) on Jun 14, 2005 at 09:50 UTC

    Look into the "scalar-context version" of the .. operator in perlop. E.g.

    while ( <> ) { if ( my $c = /^(mykeyword1|mykeyword2)$/../^\Q***END***\E$/ ) { next if $c == 1 || $c =~ /E/; print; } }

    Update: Simplified second regexp slightly.

    the lowliest monk

Re: searching data lines between keywords
by Tomtom (Scribe) on Jun 14, 2005 at 10:13 UTC
    You could take a look at the grep function too.
    my @filtered = grep { !m/(?:mykeyword|\*\*\*END\*\*\*)/ } <DATA>; print @filtered; __DATA__ mykeyword1 <several data lines> ***END*** mykeyword2 <several lines> ***END*** mykeyword3 <several lines> ***END*** mykeyword4 <again several lines> ***END***
Re: searching data lines between keywords
by salva (Canon) on Jun 14, 2005 at 10:35 UTC
    and a less fancy aproach:
    OUT: while(<>) { next unless /keyword2/; while(<>) { last OUT if /keyword4/; process_line($_); } }
Re: searching data lines between keywords
by lupey (Monk) on Jun 14, 2005 at 12:31 UTC
    TMTOWTDI, but a longer and less elegant approach:
    #!/usr/bin/perl -w use strict; use Data::Dumper; my %hash; my $currentkey; my $inkey = 0; while (<DATA>) { chomp; next if /^\s*$/; # skip blank lines if ($inkey == 0) { $currentkey = $_; $inkey = 1; next; } if (/^\*+END\*+$/) { $inkey = 0; next; } push @{$hash{$currentkey}}, $_; } print Dumper(%hash), $/; __DATA__ mykeyword1 foo1 bar1 ***END*** mykeyword2 ***END*** mykeyword3 baz3 foo3 bar3 ***END*** mykeyword4 baz4 ***END***

    Output:

    $VAR1 = 'mykeyword3'; $VAR2 = [ 'baz3', 'foo3', 'bar3' ]; $VAR3 = 'mykeyword1'; $VAR4 = [ 'foo1', 'bar1' ]; $VAR5 = 'mykeyword4'; $VAR6 = [ 'baz4' ];

    Lupey

      Where did keyword2 go? Perhaps this was a design decision. If not, adding the line @{$hash{$currentkey}} = (); into the if( $inkey == 0 ) block does the trick.

      P.S.: In order to make Data::Dumper print very nice output, pass it a hash reference, as in Dumper(\%hash). Then the output is as follows:

      $VAR1 = { 'mykeyword3' => [ 'baz3', 'foo3', 'bar3' ], 'mykeyword2' => [], 'mykeyword1' => [ 'foo1', 'bar1' ], 'mykeyword4' => [ 'baz4' ] };

      P.P.S.: Can some enlightened monk tell me the preferred way to "touch" an array (reference). That is, if I only want to clear an array if it doesn't already exist (see my @{$hash{$currentkey}} = (); addition above). The snippet push @{$hash{$currentkey}}; works but produces a Useless use of push with no values warning.

        P.P.S.: Can some enlightened monk tell me the preferred way to "touch" an array (reference). That is, if I only want to clear an array if it doesn't already exist (see my @{$hash{$currentkey}} = (); addition above). The snippet push @{$hash{$currentkey}}; works but produces a Useless use of push with no values warning.

        I typically use:

        $hash{$currentkey} ||= [];

        Which will set it to an empty array ref, if it isn't already a 'true' value, and undefined isn't true ... of course, there's lots of other not true values, as well (empty string, 0, etc.)

        @{$hash{$currentkey}} is an array so you can truncate the array just as you would any other array. You don't need to use push. Both of these will do the trick

        @{$hash{$currentkey}} = (); $#{$hash{$currentkey}} = -1;

        Lupey

Re: searching data lines between keywords
by graff (Chancellor) on Jun 14, 2005 at 21:33 UTC
    Yet another approach that would work for the kind of data you posted:
    { local $/ = '***END***'; while (<>) { print if ( /mykeyword2|mykeyword4/ ); } }
    That sets perl's "input record separator" to be the end-of-record string, instead of the default end-of-line string ("\n" or "\r\n", depending on your OS). In the version shown above, the line-termination character(s) following each "***END***" will be included at the beginning of the next record. If you prefer (and if you know for sure that your input data will always use the same style of line-termination), you can set $/ like this:
    local $/ = "***END***\n"; # or "***END***\r\n"
    UPDATE:

    Having seen AM's riz's reply below, I have to assume that s/he didn't understand what I said, so here's a full, tested version of the approach I described:

    #!/usr/bin/perl use strict; my @keepers; { local $/ = '***END***'; while ( <DATA> ) { next unless ( /^\s*mykeyword[24]/ ); chomp; push @keepers, $_; } } print join '', @keepers; __DATA__ mykeyword1 several data lines containing junk ***END*** mykeyword2 several lines containing target data ***END*** mykeyword3 several lines containing junk ***END*** mykeyword4 again several lines containing target data ***END*** mykeyword1 several data lines containing junk ***END*** mykeyword2 several lines containing target data ***END*** mykeyword3 several lines containing junk ***END*** mykeyword4 again several lines containing target data ***END***
    Note that when $/ is set to some non-default value, the "chomp" function uses that value to remove the record delimiter string from the end its operand ($_ in this case).
      Hi,
      Thanks everybody for the help.
      May be I was not able to explain my question well. I tried all your codes but they get everything between keyword2 and keyword4.

      Output should only contains lines between (keyword2 & ***END***) and (keyword4 & ***END***).
      Regards,
      riz.