in reply to Re^2: Surprisingly poor regex performance
in thread Surprisingly poor regex performance

Well, ^ doesn't mean the same as \n? at all. Like not even close.

According to Programming Perl 3rd edition pages 150 and 159, ^, when used with the /m modifier, means to match after embedded newlines or the beginning of the string.

\n? means something akin to possibly match a newline, but maybe not. That doesn't provide the engine with any understanding of where to match. Had you said something like /\n(.*$pat.*\n)/, you would have been better off (except for not matching the first line). Of course, with /m, you should be able to do /^(.*$pat.*)$/omsg and it should be about as efficient as my regex.

With regexes, it's always better to be as explicit as possible. This will allow the engine to make a number of optimizations. Some of those optimizations, as you have found out, can mean the difference between 2500 seconds and 23 seconds.

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Replies are listed 'Best First'.
Re^4: Surprisingly poor regex performance
by sgifford (Prior) on Dec 13, 2004 at 23:30 UTC
    You're right of course; I shouldn't have said meant the same as; I meant would have the same effect as.

    Still, I think it's accurate to say that ^ means very nearly the same as your example: (?:\A|\n), so it's still quite surprising to me that yours is so much faster.

      I'm actually really curious about that, as well. I /msg'ed japhy asking if he'd pop in and help us out. My benchmarking shows a 15x speedup using (?:\A|\n) over using ^ with the /m modifier.

      Oh - taking away the /m modifier when using (?:\A|\n) results in potentially something like a 1% speedup. I guess randomly adding modifiers is bad. :-)

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        How does it compare to (?:\A|(?<=\n)) ? Isn't that the accurate representation of /^.../m ? You could also try (?<![^\n]) which seems somehow even simpler.

        - tye