in reply to question about reg exp engine

Alternation ('|') in a regex is not a short-circuit operator. It's not "substitute for the first alternative found", rather it's "substitute for any alternatives found."

Some other points: I think you want ^\s not \^s, and your alternation form does not terminate the replacement part of the substitution (i.e., you're missing the last '/').

Replies are listed 'Best First'.
Re^2: question about reg exp engine
by chuckd (Scribe) on Aug 03, 2008 at 22:37 UTC
    yes I have a typo in the post. It shoud be ^\s, but that still doesn't give me an answer. Why does it run slower than running all three substitutions on different lines?
      Why does it run slower than running all three substitutions on different lines?
      Because the first three are all optimisable; they are all explicitly anchored to the beginning or end of the string, and the regex engine is smart enough to try the match only at the beginning or end of the string, respectively.

      The combined pattern is too complex to be optimised, so the engine naively tries matching at every position in the (long) string.

      Dave.

      yes I have a typo in the post. It shoud be ^\s, but that still doesn't give me an answer.
      If you fix that typo and then run Benchmarks, I think you will see that they are about the same speed. I see speed differences of 0-3% with the typo fixed, and 20-25% with the typo in place.... probably because some optimization is possible when the regex says "beginning of string" and not '^' in an arbitrary place in the string.

        Show your benchmark. I see about a 75x difference between the single regex and the multiple regex solutions offered by the OP.

        Note that benchmarks are rather like statistics: Lies, damn lies and benchmarks.


        Perl reduces RSI - it saves typing