searching data lines between keywords

riz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: searching data lines between keywords by tlm (Prior) on Jun 14, 2005 at 09:50 UTC
Look into the "scalar-context version" of the `..` operator in perlop. E.g. `while ( <> ) { if ( my $c = /^(mykeyword1\|mykeyword2)$/../^\Q*END\E$/ ) { next if $c == 1 \|\| $c =~ /E/; print; } }` [download] Update:* Simplified second regexp slightly. the lowliest monk	[reply] [d/l]
Re: searching data lines between keywords by Tomtom (Scribe) on Jun 14, 2005 at 10:13 UTC
You could take a look at the grep function too. `my @filtered = grep { !m/(?:mykeyword\|\\\END\\\)/ } <DATA>; print @filtered; __DATA__ mykeyword1 <several data lines> *END* mykeyword2 <several lines> *END* mykeyword3 <several lines> *END* mykeyword4 <again several lines> *END*` [download]	[reply] [d/l]
Re: searching data lines between keywords by salva (Canon) on Jun 14, 2005 at 10:35 UTC
and a less fancy aproach: `OUT: while(<>) { next unless /keyword2/; while(<>) { last OUT if /keyword4/; process_line($_); } }` [download]	[reply] [d/l]
Re: searching data lines between keywords by lupey (Monk) on Jun 14, 2005 at 12:31 UTC
TMTOWTDI, but a longer and less elegant approach: `#!/usr/bin/perl -w use strict; use Data::Dumper; my %hash; my $currentkey; my $inkey = 0; while (<DATA>) { chomp; next if /^\s$/; # skip blank lines if ($inkey == 0) { $currentkey = $_; $inkey = 1; next; } if (/^\+END\+$/) { $inkey = 0; next; } push @{$hash{$currentkey}}, $_; } print Dumper(%hash), $/; __DATA__ mykeyword1 foo1 bar1 END* mykeyword2 *END* mykeyword3 baz3 foo3 bar3 *END* mykeyword4 baz4 *END*` [download] Output: `$VAR1 = 'mykeyword3'; $VAR2 = [ 'baz3', 'foo3', 'bar3' ]; $VAR3 = 'mykeyword1'; $VAR4 = [ 'foo1', 'bar1' ]; $VAR5 = 'mykeyword4'; $VAR6 = [ 'baz4' ];` [download] Lupey	[reply] [d/l] [select]
Re^2: searching data lines between keywords by kaif (Friar) on Jun 14, 2005 at 15:37 UTC
Where did `keyword2` go? Perhaps this was a design decision. If not, adding the line `@{$hash{$currentkey}} = ();` into the `if( $inkey == 0 )` block does the trick. P.S.: In order to make Data::Dumper print very nice output, pass it a hash reference, as in `Dumper(\%hash)`. Then the output is as follows: `$VAR1 = { 'mykeyword3' => [ 'baz3', 'foo3', 'bar3' ], 'mykeyword2' => [], 'mykeyword1' => [ 'foo1', 'bar1' ], 'mykeyword4' => [ 'baz4' ] };` [download] P.P.S.: Can some enlightened monk tell me the preferred way to "`touch`" an array (reference). That is, if I only want to clear an array if it doesn't already exist (see my `@{$hash{$currentkey}} = ();` addition above). The snippet `push @{$hash{$currentkey}};` works but produces a `Useless use of push with no values` warning.	[reply] [d/l] [select]
Re^3: searching data lines between keywords by jhourcle (Prior) on Jun 14, 2005 at 16:32 UTC
P.P.S.: Can some enlightened monk tell me the preferred way to "touch" an array (reference). That is, if I only want to clear an array if it doesn't already exist (see my @{$hash{$currentkey}} = (); addition above). The snippet push @{$hash{$currentkey}}; works but produces a Useless use of push with no values warning. I typically use: `$hash{$currentkey} \|\|= [];` Which will set it to an empty array ref, if it isn't already a 'true' value, and undefined isn't true ... of course, there's lots of other not true values, as well (empty string, 0, etc.)	[reply] [d/l]
Re^3: searching data lines between keywords by lupey (Monk) on Jun 14, 2005 at 16:50 UTC
`@{$hash{$currentkey}}` is an array so you can truncate the array just as you would any other array. You don't need to use push. Both of these will do the trick `@{$hash{$currentkey}} = (); $#{$hash{$currentkey}} = -1;` [download] Lupey	[reply] [d/l] [select]
Re^4: searching data lines between keywords by kaif (Friar) on Jun 14, 2005 at 16:57 UTC
Re: searching data lines between keywords by graff (Chancellor) on Jun 14, 2005 at 21:33 UTC
Yet another approach that would work for the kind of data you posted: `{ local $/ = '*END'; while (<>) { print if ( /mykeyword2\|mykeyword4/ ); } }` [download] That sets perl's "input record separator" to be the end-of-record string, instead of the default end-of-line string ("\n" or "\r\n", depending on your OS). In the version shown above, the line-termination character(s) following each "END" will be included at the beginning of the next record. If you prefer (and if you know for sure that your input data will always use the same style of line-termination), you can set $/ like this: `local $/ = "END\n"; # or "END\r\n"` [download] UPDATE:* Having seen ~~AM's~~ riz's reply below, I have to assume that s/he didn't understand what I said, so here's a full, tested version of the approach I described: #!/usr/bin/perl use strict; my @keepers; { local $/ = '*END'; while ( <DATA> ) { next unless ( /^\smykeyword[24]/ ); chomp; push @keepers, $_; } } print join '', @keepers; __DATA__ mykeyword1 several data lines containing junk *END* mykeyword2 several lines containing target data *END* mykeyword3 several lines containing junk *END* mykeyword4 again several lines containing target data *END* mykeyword1 several data lines containing junk *END* mykeyword2 several lines containing target data *END* mykeyword3 several lines containing junk *END* mykeyword4 again several lines containing target data *END* [download] Note that when $/ is set to some non-default value, the "chomp" function uses that value to remove the record delimiter string from the end its operand ($_ in this case).	[reply] [d/l] [select]
Re^2: searching data lines between keywords by Anonymous Monk on Jun 16, 2005 at 09:12 UTC
Hi, Thanks everybody for the help. May be I was not able to explain my question well. I tried all your codes but they get everything between keyword2 and keyword4. Output should only contains lines between (keyword2 & *END) and (keyword4 & END*). Regards, riz.	[reply]