RE (tilly) 2: Regex'ing backreferences

You unfortunately will disallow strings with bar *after* the second foo. What you want is this:

/foo((?:(?!bar).)*)foo/
[download]

Note that (?:stuff) is a non-grouping set of parens, and (?!stuff) is a negative assertion.

UPDATE
I noticed a while ago that my RE was incorrect, decided to leave it for someone else to find and mention. But nobody did so I will mention that I forgot a ?:

/foo((?:(?!bar).)*?)foo/
[download]

Comment on RE (tilly) 2: Regex'ing backreferences Select or Download Code

Replies are listed 'Best First'.
RE: RE (tilly) 2: Regex'ing backreferences by japhy (Canon) on Sep 21, 2000 at 15:56 UTC
While this method works, it is a bit slow, since it checks for the existence of 'bar' EVERY character. Use the "unrolling the loop" technique, as found in "Mastering Regular Expressions": Update: tilly's right, mine fails to match on a specific type of string ("foobabfoo"). Fixed below. `m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' \| # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;` [download] The general pattern behind unrolling the loop is to get your regex to look something like `OK* ( NOTOK OK* )*`. `$_="goto+F.print+chop;\n=yhpaj";F1:eval`	[reply] [d/l]
RE (tilly) 4: Regex'ing backreferences by tilly (Archbishop) on Sep 21, 2000 at 16:10 UTC
Unrolling the loop is faster, but it is also more prone to mistakes. My version is the simplest that is legible yet does the specified job. My personal preference would be to wait to unroll it until after I find that I truly need to do so. I rarely do. Unlike lookaheads with wildcards in them, the underlying implementation of mine is worst case off by a constant factor from the best possible. (The worst case, of course, being things like "foobababababafoo".) Furthermore my hope is that someday (not today of course) Perl's RE engine won't need such mechanical optimizations since it will do them on the fly. And a question for you. While you unrolled it correctly, how many people here do you think would have got it right on their first or second try? (Or ever - most in my bet would have stopped before getting it right.) If your goal is correct code (which mine is) then be very careful before going through optimizations like this. UPDATE You did not, in fact, unroll it correctly. The string I gave above of a "worst case" in fact should match and does not match your regex!	[reply]
RE (tilly) 4 (good try): Regex'ing backreferences by tilly (Archbishop) on Sep 21, 2000 at 18:23 UTC
Your current regex is: `m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' \| # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;` [download] That incorrectly matches "foobbarfoo". Again, optimization is the root of all evil. :-)	[reply] [d/l]
RE: RE (tilly) 4 (good try): Regex'ing backreferences by japhy (Canon) on Sep 21, 2000 at 18:28 UTC
Ok. Changed it: `m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: (?>b+) # 1 or more b's (NO BACKTRACKING!) (?: a(?!r) # an 'a' NOT followed by an 'r' \| # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;` [download] `$_="goto+F.print+chop;\n=yhpaj";F1:eval`	[reply] [d/l]