in reply to RE (tilly) 2: Regex'ing backreferences
in thread Regex'ing backreferences

While this method works, it is a bit slow, since it checks for the existence of 'bar' EVERY character. Use the "unrolling the loop" technique, as found in "Mastering Regular Expressions":

Update: tilly's right, mine fails to match on a specific type of string ("foobabfoo"). Fixed below.
m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' | # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;
The general pattern behind unrolling the loop is to get your regex to look something like OK* ( NOTOK OK* )*.

$_="goto+F.print+chop;\n=yhpaj";F1:eval

Replies are listed 'Best First'.
RE (tilly) 4: Regex'ing backreferences
by tilly (Archbishop) on Sep 21, 2000 at 16:10 UTC
    Unrolling the loop is faster, but it is also more prone to mistakes. My version is the simplest that is legible yet does the specified job. My personal preference would be to wait to unroll it until after I find that I truly need to do so.

    I rarely do. Unlike lookaheads with wildcards in them, the underlying implementation of mine is worst case off by a constant factor from the best possible. (The worst case, of course, being things like "foobababababafoo".) Furthermore my hope is that someday (not today of course) Perl's RE engine won't need such mechanical optimizations since it will do them on the fly.

    And a question for you. While you unrolled it correctly, how many people here do you think would have got it right on their first or second try? (Or ever - most in my bet would have stopped before getting it right.) If your goal is correct code (which mine is) then be *very* careful before going through optimizations like this.

    UPDATE
    You did not, in fact, unroll it correctly. The string I gave above of a "worst case" in fact should match and does not match your regex!

RE (tilly) 4 (good try): Regex'ing backreferences
by tilly (Archbishop) on Sep 21, 2000 at 18:23 UTC
    Your current regex is:
    m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: b+ # 1 or more b's (?: a(?!r) # an 'a' NOT followed by an 'r' | # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;
    That incorrectly matches "foobbarfoo".

    Again, optimization is the root of all evil. :-)

      Ok. Changed it:
      m{ foo ( # save to $1 [^b]* # 0 or more non-b characters (?: (?>b+) # 1 or more b's (NO BACKTRACKING!) (?: a(?!r) # an 'a' NOT followed by an 'r' | # or [^a] # a non-a character ) [^b]* # 0 or more non-b characters )* # that whole b... subset, 0 or more times ) foo # and foo }x;


      $_="goto+F.print+chop;\n=yhpaj";F1:eval