in reply to print data between two regular expressions

After a successful match the $+[0] variable indicates the offset into the original string that the expression stopped matching at. Similarly the $-[0] variable is going to tell you the offset of the start of the expression match. If you then combine those two pieces of information with substr() you'll be able to extract all the text between your two expression matches.

my $str = 'a' . ('_' x 100) . 'b'; my $start = $str =~ /a/ ? $+[0] : 0; my $end = $str =~ /b/ ? $-[0] : 0; my $ext = substr $str, $start, $start - $end; print $ext;

Replies are listed 'Best First'.
Re: Re: print data between two regular expressions
by bart (Canon) on Oct 08, 2003 at 09:02 UTC
    You're assuming the first occurrence of the "b" is after the first occurrence of the "a". Watch it break — after the fix as done by davis:
    my $str = 'xbaaaaaa' . ('_' x 20) . 'bcccccccccc'; my $start = $str =~ /a+/ ? $+[0] : 0; my $end = $str =~ /b/ ? $-[0] : 0; my $ext = substr $str, $start, $end - $start; $, = " | "; $\ = "\n"; print $start, $end, $ext;
    Which prints:
    8 | 1 | ____________________bccc
    

    You need to continue the second search where the first one left off. Adding the //g switch to both regexps can do that, provided you make sure pos is clear before you start on the first one. A failed match can take care of that. I'm not sure that's absolutely necessary, but you never know... It depends of what you matched on $str before, and on whether pos gets properly localised to the current block, by perl. (Note: it doesn't make a difference if you do it or not for this particular string, but I'm trying to cover all possibilities, in general. I want to make sure the first regexp always starts searching from the start of the string.)

    my $str = 'xbaaaaaa' . ('_' x 20) . 'bcccccccccc'; $str =~ /(?!)/g; # A match that always fails, resetting pos() my $start = $str =~ /a+/g ? $+[0] : 0; my $end = $str =~ /b/g ? $-[0] : 0; my $ext = substr $str, $start, $end - $start; $, = " | "; $\ = "\n"; print $start, $end, $ext;
    resulting in:
    8 | 28 | ____________________
    

      Oh right. Well it was a quickie answer anyway (at least I got the explanation right). I'd avoid /g there though since the OP might accidentally call it in list context and screw it all up. I'd probably just do the later match against a substr() lvalue like my $end = $start + (substr( $str, $start ) =~ /b/ ? $-[0] : 0);. Roughly like that anyway. I didn't mention that I tested none of this - I'm just writing it and figuring that I know how to speak perl correctly.

      I also have an aversion to pos() since I know that its behaviour is currently undefined with regard to local(). In this case that "bug" isn't relevant so I suppose I could go along with a use of pos(). Exactly what you'd do with it though... I dunno.

      # Maybe this. I think I'd rather just access $+[0] directly. my $start = $str =~ /a/g ? pos $str : 0; my $end = $str =~ /b/g ? $-[0] : 0; my $length = $end > $start ? $end - $start : 0; my $ext = $length ? substr( $str, $start, $length ) : undef;
Re: Re: print data between two regular expressions
by davis (Vicar) on Oct 08, 2003 at 08:47 UTC

    Ok, I'm not sure if I'm misreading the OP's spec, or you are.
    I think you mean:

    my $ext = substr $str, $start, $end - $start;
    (swapped end and start) - cheers

    davis
    It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.