nkpgmartin has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to find a way to grab a field only if another field from the previous line is matched. The file format looks like this:
request(blah1 START_TIME, blah2 blah blah
I want to output blah1:blah2 but only if blah1 does *not* contain a certain string. My code looks something like this:
foreach $line (@lines) { if (/request/) ($r1, $r2) = split(/\(/); print $r2 unless /BADSTRING/;
Now I want to say if $r2 does not contain BADSTRING then for each case where this is true split the very next line (START_TIME) and grab $blah2. Any suggestions? Thanks in advance.

Replies are listed 'Best First'.
Re: grab only if pattern matches previous line
by Tomte (Priest) on Jul 09, 2003 at 19:22 UTC

    Untested, but maybe a hint to a better solution :)

    my $matched = 0; my $blah1; foreach (@lines) { if (m/^request\(.+$/) { $blah1 = $1; matched = 1; next; } if (m/^s+START_TIME,(.+)$/ && $matched) { print $blah1,':',$1; } $matched = 0; }

    regards,
    tomte


    Hlade's Law:

    If you have a difficult task, give it to a lazy person --
    they will find an easier way to do it.

Re: grab only if pattern matches previous line
by Albannach (Monsignor) on Jul 09, 2003 at 21:39 UTC
    This is a slightly different take on Tomte's concept, remembering that we only want it to catch specific request lines. Though pzbagel's approach is also good, this one has the advantage of not needing the data in an array (you could read it directly from a file), and not requiring the following line(s) of interest to be at fixed offsets from the 'request' line. In pzbagel's version you could read ahead if using a file, but that's a bit more messy as you need to check for EOF at each read. How you end up doing it will depend on further examination of your data, and some consideration of what else you might have to do with it later.
    my $buf; for my $line (@lines) { if($line =~ /^request\((.+)$/ ) { $buf = $1; undef $buf if $buf =~ /BADSTRING/; }elsif(defined $buf) { print "$buf: ", (split ', ', $line)[1],"\n"; undef $buf; } }

    --
    I'd like to be able to assign to an luser

Re: grab only if pattern matches previous line
by graff (Chancellor) on Jul 10, 2003 at 00:58 UTC
    I don't think Tomte's solution is clugy at all (and I really like Albannach's version of it). Another possibility that might work for you, depending on what the data really look like, could go like this:
    { local $/ = "request("; # change the input record separator while (<>) { if ( /^(.+)\s+START_TIME, (.*)/ and $1 ne "BADSTRING" ) { print "$1:$2\n"; } } }
    This assumes that the data stream is a series of records that all start with "request(" and have line breaks as indicated in your example (because /(.*)/ does not match a new-line character). But if your data varies from that a bit, it might still be pretty easy to adapt this idea to handle it.
      ++graff, that was exactly what I was thinking. Let Perl grab a whole request at a time and match away.
Re: grab only if pattern matches previous line
by pzbagel (Chaplain) on Jul 09, 2003 at 19:57 UTC

    While I'll give a ++ to Tomte because it does what you want based on your code. The $matched flag thing is rather cludgy. It may make your code harder to read down the road. There are some better options which may be easier to decipher if you change your for loop a little. For instance:

    @lines=( "request(goodstring", "START_TIME,some goodness", "request(BADSTRING", "START_TIME,some bad stuff", "request(randomstring", "START_TIME,more goodness", ); for my $x (0..$#lines) { if($lines[$x]=~/request/) { (undef, $blah1) = split /\(/, $lines[$x]; if($blah1!~/BADSTRING/) { (undef, $blah2)=split /,/, $lines[$x+1];#<---Plus 1 gets next l +ine print "$blah1:$blah2\n"; } } } ################# ## Output goodstring:some goodness randomstring:more goodness

    Since I use a counter to access elements in @lines, I can easily reference the next line of the input without keeping track of a flag. If you want to skip checking the line with $blah2 in it for /request/ you have to use the three argument version of for and then increment the counter after you print:

    for ($x=0;$x<=$#lines;$x++) # Three arguments are treated differently than using # the for loop with the list argument { if($lines[$x]=~/request/) { (undef, $blah1) = split /\(/, $lines[$x]; if($blah1!~/BADSTRING/) { (undef, $blah2)=split /,/, $lines[$x+1];, print "$blah1:$blah2\n"; $x++; #<--------Increment to skip the next line } } }
      (undef, $blah1) = split /\(/, $lines[$x];

      That syntax is ugly and, more importantly, it doesn't scale. (Say, due to a change in input format, you need the ninth thing on the line instead of the second?) This

      $blah1 = ( split /\(/, $lines[$x] )[1];
      is much cleaner, I think.

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: grab only if pattern matches previous line
by greenFox (Vicar) on Jul 10, 2003 at 05:39 UTC

    I may be misunderstanding completely but it seems to me that if you re-word the question as "grab the next line if I get a match" then a cleaner (IMO) solution comes out. Also you did not say what to do if two consecutive lines match your pattern, should it be ignored or treated in the same way? Below are two solutions which handle each way. (you will need to adapt for your specific data)

    my @lines = qw(foo foo2foo bar bar2bar baz2 bell bell1bell); print "Method 1\n"; for my $i (0..$#lines){ if ( $lines[$i] =~ /2/ ){ print "$lines[$i], $lines[$i + 1]\n"; } } print "\n\nMethod 2\n"; for (my $i=0; $i <= $#lines; $i++){ if ( $lines[$i] =~ /2/ ){ print "$lines[$i], $lines[$i + 1]\n"; $i++; } }

    --
    Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

      Hi, I should probably clarify a couple things. 1. the data file is large. 2. there is a lot of useless header information and data I want to skip (some of it messy). 3. It is true that for each case where there is a request(REQUEST_NAME the very next line will always be the START_TIME, so really just incrementing one line past the good requests would work (though it's probably not as safe). 4. the BADSTRING is not the entire string. Thanks!

        If your data file is really large then you should work on the data as you slurp it. I used an array because that is what your sample code used :)

        while(<FILE>){ next if /^matches messy header/; chomp; if (/matches some string/){ my $next_line = <FILE>; # do something to $next_line here print "$_, $next_line\n"; } }

        --
        Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Re: grab only if pattern matches previous line
by pernod (Chaplain) on Jul 10, 2003 at 13:33 UTC

    How about using negative lookahead? This way, you check to see whether what follows 'request(' actually contains the BADSTRING, and fails if that is the case.

    use strict; $/ = ""; # Set up for slurp mode my $data = <DATA>; # Grab the entire file to a string. my %matches = $data =~ /^request\( # start of record (?!BADSTRING) # Fails if BADSTRING is present ([^\n]+) # Captures $blah1 \n # Match the newline [^,]+, # Match anything up to the comma ([^\n]+) # Capture $blah2. Everything from /mgx; # the comma to the next \n foreach ( keys %matches ) { print "$_ : $matches{$_}\n"; } __DATA__ request(goodstring START_TIME,some goodness request(BADSTRING START_TIME,some bad stuff request(randomstring START_TIME,more goodness request(whatever you are

    This gives:

    goodstring : some goodness randomstring : more goodness

    Thanks to pzbagel for his excellent sample data :)

    This uses several neat tricks. /g repeats the match until the string is empty. /m allows embedded newlines in the string to match. Calling the regex in list context and assigning it to a hash gives us a nice little summary to dump afterwards.

    Disadvantages to this approach is of course the amount of memory used by $data and %matches. Malformed data may break /g, and of course the relative illegibility of the regex may be a problem. If your datasamples aren't too big, it might work, though.

    Hope this helps, and I welcome (and appreciate) any criticism on my regex-programming style.

    pernod
    --
    Mischief. Mayhem. Soap.