print data between two regular expressions

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: print data between two regular expressions by jeffa (Bishop) on Oct 08, 2003 at 04:13 UTC
One of my favorite ways to print out only the lines of data between delimiters is with the range operator (..) also called the "flip-flop" op. If we drop the initial print (Items found under [moo]), this can be done with a fairly straight forward one-liner: `perl -ne'print if /^\[moo\]/../^\[cow\]/ and !/^\[/' log.log` [download] But let's go ahead and blow this up: `use strict; use warnings; open FH, '<', 'log.log' or die "can't open file\n"; print "Items found under [moo]\n"; while (<FH>) { if (/^\[moo\]/../^\[cow\]/) { print unless /^\[/; } }` [download] Basically, the left hand side of the .. op is the delimiter you want to start printing when found, and the right hand side of the .. op is the delimiter you want to stop printing when found. Since you don't want to print the delimiters themselves, you have to add a condition to do so. I picked anything that starts with a left bracket, you will have to pick something better if your data will start with one. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re: Re: print data between two regular expressions by jmcnamara (Monsignor) on Oct 08, 2003 at 09:19 UTC
For the case where it isn't known that `[cow]` follows `[moo]` you could also use the `...` range operator to make the solution more general. The following prints the range from `[moo]` to the next line starting with `[`: `perl -ne'print if /^\[moo\]/ ... /^\[/ and !/^\[/' log.txt` [download] The 3 dot operator `...` works like the 2 dot operator `..` except that the right operand isn't tested until the next evaluation. -- John.	[reply] [d/l] [select]
Re: print data between two regular expressions by Abigail-II (Bishop) on Oct 08, 2003 at 11:10 UTC
And here's a solution that doesn't depend on the presence or absense of a specific pattern in the intermediate lines of data: `perl -ne 'print if (/^\[moo]/ and $c = 0, 1) .. (/^\[cow]/ and $c = 0, 1) and $c ++' log.log` [download] Abigail	[reply] [d/l]
Re: print data between two regular expressions by davido (Cardinal) on Oct 08, 2003 at 02:18 UTC
Looking in the Q&A section I found the following: How do I extract all text between two keywords like start and end?. I think you'll find some good solutions there. Dave "If I had my life to do over again, I'd be a plumber." -- Albert Einstein	[reply]
Re: print data between two regular expressions by diotalevi (Canon) on Oct 08, 2003 at 00:56 UTC
After a successful match the `$+[0]` variable indicates the offset into the original string that the expression stopped matching at. Similarly the `$-[0]` variable is going to tell you the offset of the start of the expression match. If you then combine those two pieces of information with substr() you'll be able to extract all the text between your two expression matches. `my $str = 'a' . ('_' x 100) . 'b'; my $start = $str =~ /a/ ? $+[0] : 0; my $end = $str =~ /b/ ? $-[0] : 0; my $ext = substr $str, $start, $start - $end; print $ext;` [download]	[reply] [d/l] [select]
Re: Re: print data between two regular expressions by bart (Canon) on Oct 08, 2003 at 09:02 UTC
You're assuming the first occurrence of the "b" is after the first occurrence of the "a". Watch it break — after the fix as done by davis: `my $str = 'xbaaaaaa' . ('_' x 20) . 'bcccccccccc'; my $start = $str =~ /a+/ ? $+[0] : 0; my $end = $str =~ /b/ ? $-[0] : 0; my $ext = substr $str, $start, $end - $start; $, = " \| "; $\ = "\n"; print $start, $end, $ext;` [download] Which prints: 8 \| 1 \| ____________________bccc You need to continue the second search where the first one left off. Adding the //g switch to both regexps can do that, provided you make sure pos is clear before you start on the first one. A failed match can take care of that. I'm not sure that's absolutely necessary, but you never know... It depends of what you matched on $str before, and on whether pos gets properly localised to the current block, by perl. (Note: it doesn't make a difference if you do it or not for this particular string, but I'm trying to cover all possibilities, in general. I want to make sure the first regexp always starts searching from the start of the string.) `my $str = 'xbaaaaaa' . ('_' x 20) . 'bcccccccccc'; $str =~ /(?!)/g; # A match that always fails, resetting pos() my $start = $str =~ /a+/g ? $+[0] : 0; my $end = $str =~ /b/g ? $-[0] : 0; my $ext = substr $str, $start, $end - $start; $, = " \| "; $\ = "\n"; print $start, $end, $ext;` [download] resulting in: 8 \| 28 \| ____________________	[reply] [d/l] [select]
Re: Re: Re: print data between two regular expressions by diotalevi (Canon) on Oct 08, 2003 at 15:50 UTC
Oh right. Well it was a quickie answer anyway (at least I got the explanation right). I'd avoid /g there though since the OP might accidentally call it in list context and screw it all up. I'd probably just do the later match against a substr() lvalue like `my $end = $start + (substr( $str, $start ) =~ /b/ ? $-[0] : 0);`. Roughly like that anyway. I didn't mention that I tested none of this - I'm just writing it and figuring that I know how to speak perl correctly. I also have an aversion to pos() since I know that its behaviour is currently undefined with regard to local(). In this case that "bug" isn't relevant so I suppose I could go along with a use of pos(). Exactly what you'd do with it though... I dunno. `# Maybe this. I think I'd rather just access $+[0] directly. my $start = $str =~ /a/g ? pos $str : 0; my $end = $str =~ /b/g ? $-[0] : 0; my $length = $end > $start ? $end - $start : 0; my $ext = $length ? substr( $str, $start, $length ) : undef;` [download]	[reply] [d/l] [select]
Re: Re: print data between two regular expressions by davis (Vicar) on Oct 08, 2003 at 08:47 UTC
~~Ok, I'm not sure if I'm misreading the OP's spec, or you are.~~ I think you mean: `my $ext = substr $str, $start, $end - $start;` [download] (swapped end and start) - cheers davis It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.	[reply] [d/l]
Re: print data between two regular expressions by Anonymous Monk on Oct 09, 2003 at 04:46 UTC
Thanks for all the different explanations! Your (off of you rather) help has been wonderful. -Anon	[reply]