eibwen has asked for the wisdom of the Perl Monks concerning the following question:

Is

/ ( (?<=prefix) (infix) (?=suffix) ) (??{ print $^N }) /gx
really the same as
/ prefix (infix) suffix (??{ print $^N }) /gx
as far as $^N = 'infix' is concerned? Actually I can't seem to find any net differences in the above. I came to this presumption after having used the former for awhile and tested it with a fairly simple RE (such as the above), but wanted to confirm this is universally true for literal strings and ask if it might break otherwise.

Similarly, if they are equivalent, why doesn't

/ ( (?<=some(optional)?prefix) (infix) (?=suffix) ) (??{ print $^N }) /gx
optimize away to
/ some(optional)?prefix (infix) suffix (??{ print $^N }) /gx
instead of resulting in the Variable length lookbehind not implemented in regex error?

Lastly, (this might be redundant given answers to the above, but) when is usage of ?<= and ?= appropriate? The only possibility I can think of is something involving nested (), but I can only think of contrived examples at the moment. I seem to recall needing them in the past, but I can't seem to recall the implementation...

Replies are listed 'Best First'.
Re: Regular Expression Constructs ?<= and ?=
by davido (Cardinal) on Jan 12, 2006 at 02:11 UTC

    Your first two examples do differ in a few ways. First, you're capturing "infix" into \2 when you use the RE that contains lookahead/lookbehind. That's because you've got an extra set of capturing parens. The second regexp, the one where you're using literal anchors instead of lookahead and lookbehind... that one captures "infix" into \1, because it doesn't have that extra set of parens.

    Also, your first example essentially looks for "infix", and then checks to see if "prefix" comes immediately before it, and if "suffix" comes immediately after. The second example looks for "prefixinfixsuffix", pretty much all at once. In your simple example, (discounting the difference in capturing parens) there is no practical difference; you won't see a difference. But there are plenty of cases where lookahead and lookbehind are useful. One example is given in perlretut.

    As for why (?<=some(optional)?prefix) is allowed to generate an error, rather than simply being optimized to some(optional)?prefix, well, for one thing that would be inconsistent with the documentation which says that variable width lookbehind is not supported. Also, it would mean turning a lookbehind assertion (which doesn't "consume" any of the string it's matching against) into a string gobbling assertion in a way that couldn't be controlled by the person composing the regular expression; also contrary to what is defined in the documentation.

    Just for kicks, have a look at the output of the following snippet:

    use strict; use warnings; use YAPE::Regex::Explain; my( @REx ) = ( qr/((?<=prefix)(infix)(?=suffix))(??{ print $^N, "\n" } +)/, qr/prefix(infix)suffix(??{ print $^N, "\n" })/ ); my $string = "prefixinfixsuffix"; for( @REx ) { my $exp = YAPE::Regex::Explain->new($_)->explain; print $exp; 1 if $string =~ $_; }

    Dave

Re: Regular Expression Constructs ?<= and ?=
by ikegami (Patriarch) on Jan 12, 2006 at 02:26 UTC

    First, do you realize
    (??{ print $^N })
    is causing the regexp to try to match "1", the return value of print? You should probably use something that's guaranteed not to match like
    (??{ print("$^N\n"); '(?!)' })
    or something that's guaranteed to match like
    (??{ print("$^N\n"); '(?=)' })
    (depending on your intention) instead of relying on the return value of print and the presence/absence of the return value of print in the string to match.

    You use /g, so I assume you'll call the regexp more than once in a scalar context, or in a list context.

    So compare

    local $_ = 'aaaaabaaaaab'; 1 while / ( (?<=a) (a) (?=[ab]) ) (??{ print $^N; '(?=)' }) /gx; print("\n");

    with

    local $_ = 'aaaaabaaaaab'; 1 while / a (a) [ab] (??{ print $^N; '(?=)' }) /gx; print("\n");

    The first outputs "aaaaaaaa", which the second outputs "aaaa".

    (?=...) is useful when followed by something which isn't zero-width, when /g is used and when used in substititutions.

    (?<=...) is useful when /g is used and when used in substititutions.

      First, do you realize (??{ print $^N }) is causing the regexp to try to match "1", the return value of print? You should probably use something that's guaranteed not to match like (??{ print("$^N\n"); '(?!)' })or something that's guaranteed to match like (??{ print("$^N\n"); '(?=)' })

      Or if you are only using the contents of the code block for it's side effects, use (?{ ... }) which always succeeds and otherwise has no effect upon the matching process. From perlre

      This zero-width assertion evaluates any embedded Perl code. It always succeeds, and its code is not interpolated.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Or if you *do* want your code's evaluation to affect the regex's match, use the conditional expression (?(...)...|...).

        (?(?{ YOUR CODE HERE }) TRUE | FALSE )

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Regular Expression Constructs ?<= and ?=
by mikeock (Hermit) on Jan 12, 2006 at 15:27 UTC