Re: Regex'ing backreferences

I think the problem here is to say that you do NOT want bar in $1. Because if you wanted it, it may be easier :-)

Something like this should hopefully do want you want:

/foo(?!.*?bar.*?)(.*?)foo/;
[download]

You first try to find 'foo'. OK. Then you check if the substring between 'foo' and the other 'foo' contains 'bar', but with a zero-width assertion. If that does not contain 'bar', then you can save it into $1.

Thank you :-) thanks to your question I have understood why zero-witdh assertion is really cool!!

Hope this helps. Bye

James

Update: and what about doing:

/foo(?!.*?bar.*?foo)(.*?)foo/;
[download]

I think it is better, unless you show me another "worst case" :-))

Comment on Re: Regex'ing backreferences Select or Download Code

Replies are listed 'Best First'.
RE (tilly) 2: Regex'ing backreferences by tilly (Archbishop) on Sep 21, 2000 at 14:37 UTC
You unfortunately will disallow strings with bar after the second foo. What you want is this: `/foo((?:(?!bar).))foo/` [download] Note that (?:stuff) is a non-grouping set of parens, and (?!stuff) is a negative assertion. UPDATE* I noticed a while ago that my RE was incorrect, decided to leave it for someone else to find and mention. But nobody did so I will mention that I forgot a ?: `/foo((?:(?!bar).)*?)foo/` [download]	[reply] [d/l] [select]
RE: RE (tilly) 2: Regex'ing backreferences by japhy (Canon) on Sep 21, 2000 at 15:56 UTC
While this method works, it is a bit slow, since it checks for the existence of 'bar' EVERY character. Use the "unrolling the loop" technique, as found in "Mastering Regular Expressions": Update: tilly's right, mine fails to match on a specific type of string ("foobabfoo"). Fixed below. `m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' \| # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;` [download] The general pattern behind unrolling the loop is to get your regex to look something like `OK* ( NOTOK OK* )*`. `$_="goto+F.print+chop;\n=yhpaj";F1:eval`	[reply] [d/l]
RE (tilly) 4: Regex'ing backreferences by tilly (Archbishop) on Sep 21, 2000 at 16:10 UTC
Unrolling the loop is faster, but it is also more prone to mistakes. My version is the simplest that is legible yet does the specified job. My personal preference would be to wait to unroll it until after I find that I truly need to do so. I rarely do. Unlike lookaheads with wildcards in them, the underlying implementation of mine is worst case off by a constant factor from the best possible. (The worst case, of course, being things like "foobababababafoo".) Furthermore my hope is that someday (not today of course) Perl's RE engine won't need such mechanical optimizations since it will do them on the fly. And a question for you. While you unrolled it correctly, how many people here do you think would have got it right on their first or second try? (Or ever - most in my bet would have stopped before getting it right.) If your goal is correct code (which mine is) then be very careful before going through optimizations like this. UPDATE You did not, in fact, unroll it correctly. The string I gave above of a "worst case" in fact should match and does not match your regex!	[reply]
RE (tilly) 4 (good try): Regex'ing backreferences by tilly (Archbishop) on Sep 21, 2000 at 18:23 UTC
Your current regex is: `m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' \| # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;` [download] That incorrectly matches "foobbarfoo". Again, optimization is the root of all evil. :-)	[reply] [d/l]
RE: RE (tilly) 4 (good try): Regex'ing backreferences by japhy (Canon) on Sep 21, 2000 at 18:28 UTC
RE (tilly) 2 (still wrong): Regex'ing backreferences by tilly (Archbishop) on Sep 21, 2000 at 18:27 UTC
What about "foohellofoobarfoo"? You should be matching and putting "hello" into $1, but your new version won't.	[reply]