grab only if pattern matches previous line

nkpgmartin has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: grab only if pattern matches previous line by Tomte (Priest) on Jul 09, 2003 at 19:22 UTC
Untested, but maybe a hint to a better solution :) `my $matched = 0; my $blah1; foreach (@lines) { if (m/^request\(.+$/) { $blah1 = $1; matched = 1; next; } if (m/^s+START_TIME,(.+)$/ && $matched) { print $blah1,':',$1; } $matched = 0; }` [download] regards, tomte Hlade's Law: If you have a difficult task, give it to a lazy person -- they will find an easier way to do it.	[reply] [d/l]
Re: grab only if pattern matches previous line by Albannach (Monsignor) on Jul 09, 2003 at 21:39 UTC
This is a slightly different take on Tomte's concept, remembering that we only want it to catch specific request lines. Though pzbagel's approach is also good, this one has the advantage of not needing the data in an array (you could read it directly from a file), and not requiring the following line(s) of interest to be at fixed offsets from the 'request' line. In pzbagel's version you could read ahead if using a file, but that's a bit more messy as you need to check for EOF at each read. How you end up doing it will depend on further examination of your data, and some consideration of what else you might have to do with it later. `my $buf; for my $line (@lines) { if($line =~ /^request\((.+)$/ ) { $buf = $1; undef $buf if $buf =~ /BADSTRING/; }elsif(defined $buf) { print "$buf: ", (split ', ', $line)[1],"\n"; undef $buf; } }` [download] -- I'd like to be able to assign to an luser	[reply] [d/l]
Re: grab only if pattern matches previous line by graff (Chancellor) on Jul 10, 2003 at 00:58 UTC
I don't think Tomte's solution is clugy at all (and I really like Albannach's version of it). Another possibility that might work for you, depending on what the data really look like, could go like this: `{ local $/ = "request("; # change the input record separator while (<>) { if ( /^(.+)\s+START_TIME, (.)/ and $1 ne "BADSTRING" ) { print "$1:$2\n"; } } }` [download] This assumes that the data stream is a series of records that all start with `"request("` and have line breaks as indicated in your example (because /(.)/ does not match a new-line character). But if your data varies from that a bit, it might still be pretty easy to adapt this idea to handle it.	[reply] [d/l] [select]
Re: Re: grab only if pattern matches previous line by rje (Deacon) on Jul 10, 2003 at 15:20 UTC
++graff, that was exactly what I was thinking. Let Perl grab a whole request at a time and match away.	[reply]
Re: grab only if pattern matches previous line by pzbagel (Chaplain) on Jul 09, 2003 at 19:57 UTC
While I'll give a ++ to Tomte because it does what you want based on your code. The $matched flag thing is rather cludgy. It may make your code harder to read down the road. There are some better options which may be easier to decipher if you change your for loop a little. For instance: `@lines=( "request(goodstring", "START_TIME,some goodness", "request(BADSTRING", "START_TIME,some bad stuff", "request(randomstring", "START_TIME,more goodness", ); for my $x (0..$#lines) { if($lines[$x]=~/request/) { (undef, $blah1) = split /\(/, $lines[$x]; if($blah1!~/BADSTRING/) { (undef, $blah2)=split /,/, $lines[$x+1];#<---Plus 1 gets next l +ine print "$blah1:$blah2\n"; } } } ################# ## Output goodstring:some goodness randomstring:more goodness` [download] Since I use a counter to access elements in @lines, I can easily reference the next line of the input without keeping track of a flag. If you want to skip checking the line with $blah2 in it for /request/ you have to use the three argument version of for and then increment the counter after you print: `for ($x=0;$x<=$#lines;$x++) # Three arguments are treated differently than using # the for loop with the list argument { if($lines[$x]=~/request/) { (undef, $blah1) = split /\(/, $lines[$x]; if($blah1!~/BADSTRING/) { (undef, $blah2)=split /,/, $lines[$x+1];, print "$blah1:$blah2\n"; $x++; #<--------Increment to skip the next line } } }` [download]	[reply] [d/l] [select]
Re: Re: grab only if pattern matches previous line by sauoq (Abbot) on Jul 10, 2003 at 01:15 UTC
`(undef, $blah1) = split /\(/, $lines[$x];` That syntax is ugly and, more importantly, it doesn't scale. (Say, due to a change in input format, you need the ninth thing on the line instead of the second?) This `$blah1 = ( split /\(/, $lines[$x] )[1];` [download] is much cleaner, I think. -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]
Re: grab only if pattern matches previous line by greenFox (Vicar) on Jul 10, 2003 at 05:39 UTC
I may be misunderstanding completely but it seems to me that if you re-word the question as "grab the next line if I get a match" then a cleaner (IMO) solution comes out. Also you did not say what to do if two consecutive lines match your pattern, should it be ignored or treated in the same way? Below are two solutions which handle each way. (you will need to adapt for your specific data) `my @lines = qw(foo foo2foo bar bar2bar baz2 bell bell1bell); print "Method 1\n"; for my $i (0..$#lines){ if ( $lines[$i] =~ /2/ ){ print "$lines[$i], $lines[$i + 1]\n"; } } print "\n\nMethod 2\n"; for (my $i=0; $i <= $#lines; $i++){ if ( $lines[$i] =~ /2/ ){ print "$lines[$i], $lines[$i + 1]\n"; $i++; } }` [download] -- Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho	[reply] [d/l]
Re: Re: grab only if pattern matches previous line by nkpgmartin (Sexton) on Jul 10, 2003 at 15:11 UTC
Hi, I should probably clarify a couple things. 1. the data file is large. 2. there is a lot of useless header information and data I want to skip (some of it messy). 3. It is true that for each case where there is a request(REQUEST_NAME the very next line will always be the START_TIME, so really just incrementing one line past the good requests would work (though it's probably not as safe). 4. the BADSTRING is not the entire string. Thanks!	[reply]
Re: Re: Re: grab only if pattern matches previous line by greenFox (Vicar) on Jul 11, 2003 at 02:36 UTC
If your data file is really large then you should work on the data as you slurp it. I used an array because that is what your sample code used :) `while(<FILE>){ next if /^matches messy header/; chomp; if (/matches some string/){ my $next_line = <FILE>; # do something to $next_line here print "$_, $next_line\n"; } }` [download] -- Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho	[reply] [d/l]
Re: grab only if pattern matches previous line by pernod (Chaplain) on Jul 10, 2003 at 13:33 UTC
How about using negative lookahead? This way, you check to see whether what follows 'request(' actually contains the BADSTRING, and fails if that is the case. use strict; $/ = ""; # Set up for slurp mode my $data = <DATA>; # Grab the entire file to a string. my %matches = $data =~ /^request\( # start of record (?!BADSTRING) # Fails if BADSTRING is present ([^\n]+) # Captures $blah1 \n # Match the newline [^,]+, # Match anything up to the comma ([^\n]+) # Capture $blah2. Everything from /mgx; # the comma to the next \n foreach ( keys %matches ) { print "$_ : $matches{$_}\n"; } __DATA__ request(goodstring START_TIME,some goodness request(BADSTRING START_TIME,some bad stuff request(randomstring START_TIME,more goodness request(whatever you are [download] This gives: `goodstring : some goodness randomstring : more goodness` [download] Thanks to pzbagel for his excellent sample data :) This uses several neat tricks. /g repeats the match until the string is empty. /m allows embedded newlines in the string to match. Calling the regex in list context and assigning it to a hash gives us a nice little summary to dump afterwards. Disadvantages to this approach is of course the amount of memory used by `$data` and `%matches`. Malformed data may break /g, and of course the relative illegibility of the regex may be a problem. If your datasamples aren't too big, it might work, though. Hope this helps, and I welcome (and appreciate) any criticism on my regex-programming style. pernod -- Mischief. Mayhem. Soap.	[reply] [d/l] [select]