in reply to Inexplicably slow regex

Thanks to Zigon for suggesting the -Mre=debug switch which helped explain what's going on. And thanks to grandfather and duckyd for providing insightful examples to help isolate the problem. And thanks to everyone else who spent any effort trying to help.

Here's what it appears is happening in the ^ case:
  1. The regex engine looks for a match for the fixed pattern $Key starting from the current position.
  2. If that pattern does not occur in the string then there can be no match and the regex fails.
  3. If that pattern does match, the engine backs up to the current position and starts matching the start of the pattern.
  4. When #3 fails, the engine advances the current position to the next line and goes back to #1, instead of #2 as might be expected.

So the problem is that the engine performs a scan of the input for the fixed pattern $KEY 10000 times - the first time starting on line 1, the next starting on line 2, etc. This is an n2 operation where n is the length of the string. For some reason the other two versions don't repeat step 1 so are much faster for large input.

Although there may be design issues requiring this behavior, from my perspective it appears to be a bug. Does anyone know where I should report this? Or whether it's a known shortcoming of the regex engine design and I shouldn't bother?

Replies are listed 'Best First'.
Re: Problem Identified
by sgt (Deacon) on Sep 13, 2006 at 22:58 UTC

    please do report it: send to p5p a short report like grandfather's test program so that people who know the innards of the regex engine can look at it. thanks