in reply to Re: Regex'ing backreferences
in thread Regex'ing backreferences

You unfortunately will disallow strings with bar *after* the second foo. What you want is this:
/foo((?:(?!bar).)*)foo/
Note that (?:stuff) is a non-grouping set of parens, and (?!stuff) is a negative assertion.

UPDATE
I noticed a while ago that my RE was incorrect, decided to leave it for someone else to find and mention. But nobody did so I will mention that I forgot a ?:

/foo((?:(?!bar).)*?)foo/

Replies are listed 'Best First'.
RE: RE (tilly) 2: Regex'ing backreferences
by japhy (Canon) on Sep 21, 2000 at 15:56 UTC
    While this method works, it is a bit slow, since it checks for the existence of 'bar' EVERY character. Use the "unrolling the loop" technique, as found in "Mastering Regular Expressions":

    Update: tilly's right, mine fails to match on a specific type of string ("foobabfoo"). Fixed below.
    m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' | # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;
    The general pattern behind unrolling the loop is to get your regex to look something like OK* ( NOTOK OK* )*.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval
      Unrolling the loop is faster, but it is also more prone to mistakes. My version is the simplest that is legible yet does the specified job. My personal preference would be to wait to unroll it until after I find that I truly need to do so.

      I rarely do. Unlike lookaheads with wildcards in them, the underlying implementation of mine is worst case off by a constant factor from the best possible. (The worst case, of course, being things like "foobababababafoo".) Furthermore my hope is that someday (not today of course) Perl's RE engine won't need such mechanical optimizations since it will do them on the fly.

      And a question for you. While you unrolled it correctly, how many people here do you think would have got it right on their first or second try? (Or ever - most in my bet would have stopped before getting it right.) If your goal is correct code (which mine is) then be *very* careful before going through optimizations like this.

      UPDATE
      You did not, in fact, unroll it correctly. The string I gave above of a "worst case" in fact should match and does not match your regex!

      Your current regex is:
      m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' | # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;
      That incorrectly matches "foobbarfoo".

      Again, optimization is the root of all evil. :-)

        Ok. Changed it:
        m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: (?>b+) # 1 or more b's (NO BACKTRACKING!) (?: a(?!r) # an 'a' NOT followed by an 'r' | # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;


        $_="goto+F.print+chop;\n=yhpaj";F1:eval