Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Being new to perl and having limited resourced (time limited dialup), I could a simple example (if there is one) of how you can parse a file and print specific data located between two to things matching a regular expression.
log.log: [moo] data1 data2 data3 [cow]
and print out:
Items found under [moo] data1 data2 data3
Thanks in advance.

edited: Wed Oct 8 03:57:40 2003 by jeffa - code tags

Replies are listed 'Best First'.
Re: print data between two regular expressions
by jeffa (Bishop) on Oct 08, 2003 at 04:13 UTC
    One of my favorite ways to print out only the lines of data between delimiters is with the range operator (..) also called the "flip-flop" op. If we drop the initial print (Items found under [moo]), this can be done with a fairly straight forward one-liner:
    perl -ne'print if /^\[moo\]/../^\[cow\]/ and !/^\[/' log.log
    But let's go ahead and blow this up:
    use strict; use warnings; open FH, '<', 'log.log' or die "can't open file\n"; print "Items found under [moo]\n"; while (<FH>) { if (/^\[moo\]/../^\[cow\]/) { print unless /^\[/; } }
    Basically, the left hand side of the .. op is the delimiter you want to start printing when found, and the right hand side of the .. op is the delimiter you want to stop printing when found. Since you don't want to print the delimiters themselves, you have to add a condition to do so. I picked anything that starts with a left bracket, you will have to pick something better if your data will start with one.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      For the case where it isn't known that [cow] follows [moo] you could also use the ... range operator to make the solution more general. The following prints the range from [moo] to the next line starting with [:
      perl -ne'print if /^\[moo\]/ ... /^\[/ and !/^\[/' log.txt

      The 3 dot operator ... works like the 2 dot operator .. except that the right operand isn't tested until the next evaluation.

      --
      John.

      And here's a solution that doesn't depend on the presence or absense of a specific pattern in the intermediate lines of data:
      perl -ne 'print if (/^\[moo]/ and $c = 0, 1) .. (/^\[cow]/ and $c = 0, 1) and $c ++' log.log

      Abigail

Re: print data between two regular expressions
by davido (Cardinal) on Oct 08, 2003 at 02:18 UTC
Re: print data between two regular expressions
by diotalevi (Canon) on Oct 08, 2003 at 00:56 UTC

    After a successful match the $+[0] variable indicates the offset into the original string that the expression stopped matching at. Similarly the $-[0] variable is going to tell you the offset of the start of the expression match. If you then combine those two pieces of information with substr() you'll be able to extract all the text between your two expression matches.

    my $str = 'a' . ('_' x 100) . 'b'; my $start = $str =~ /a/ ? $+[0] : 0; my $end = $str =~ /b/ ? $-[0] : 0; my $ext = substr $str, $start, $start - $end; print $ext;
      You're assuming the first occurrence of the "b" is after the first occurrence of the "a". Watch it break — after the fix as done by davis:
      my $str = 'xbaaaaaa' . ('_' x 20) . 'bcccccccccc'; my $start = $str =~ /a+/ ? $+[0] : 0; my $end = $str =~ /b/ ? $-[0] : 0; my $ext = substr $str, $start, $end - $start; $, = " | "; $\ = "\n"; print $start, $end, $ext;
      Which prints:
      8 | 1 | ____________________bccc
      

      You need to continue the second search where the first one left off. Adding the //g switch to both regexps can do that, provided you make sure pos is clear before you start on the first one. A failed match can take care of that. I'm not sure that's absolutely necessary, but you never know... It depends of what you matched on $str before, and on whether pos gets properly localised to the current block, by perl. (Note: it doesn't make a difference if you do it or not for this particular string, but I'm trying to cover all possibilities, in general. I want to make sure the first regexp always starts searching from the start of the string.)

      my $str = 'xbaaaaaa' . ('_' x 20) . 'bcccccccccc'; $str =~ /(?!)/g; # A match that always fails, resetting pos() my $start = $str =~ /a+/g ? $+[0] : 0; my $end = $str =~ /b/g ? $-[0] : 0; my $ext = substr $str, $start, $end - $start; $, = " | "; $\ = "\n"; print $start, $end, $ext;
      resulting in:
      8 | 28 | ____________________
      

        Oh right. Well it was a quickie answer anyway (at least I got the explanation right). I'd avoid /g there though since the OP might accidentally call it in list context and screw it all up. I'd probably just do the later match against a substr() lvalue like my $end = $start + (substr( $str, $start ) =~ /b/ ? $-[0] : 0);. Roughly like that anyway. I didn't mention that I tested none of this - I'm just writing it and figuring that I know how to speak perl correctly.

        I also have an aversion to pos() since I know that its behaviour is currently undefined with regard to local(). In this case that "bug" isn't relevant so I suppose I could go along with a use of pos(). Exactly what you'd do with it though... I dunno.

        # Maybe this. I think I'd rather just access $+[0] directly. my $start = $str =~ /a/g ? pos $str : 0; my $end = $str =~ /b/g ? $-[0] : 0; my $length = $end > $start ? $end - $start : 0; my $ext = $length ? substr( $str, $start, $length ) : undef;

      Ok, I'm not sure if I'm misreading the OP's spec, or you are.
      I think you mean:

      my $ext = substr $str, $start, $end - $start;
      (swapped end and start) - cheers

      davis
      It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
Re: print data between two regular expressions
by Anonymous Monk on Oct 09, 2003 at 04:46 UTC
    Thanks for all the different explanations! Your (off of you rather) help has been wonderful. -Anon