in reply to Re: Perl regexp matching is slow??
in thread Perl regexp matching is slow??
The big difference between these two is whether you think regular expressions should be treated as descriptive (here's the effect I want) or prescriptive (here's how I want you to do it).
Well, both are at play in the perl engine. If it was fully prescriptive then the engine wouldnt be able to to do things like using fixed string optimisations.
Then it would be plenty safe to take a regexp from the user and use it to run a search. But with the current Perl regexes, no way.
Doesnt this criticism apply equally to Thompsons algorithm or to DFA construction? I would have thought the only difference would be that in a DFA youd see performance issues with compilation and not execution.
And just to be clear, that is no more incompatible with whatever optimizations you might add (like exact string search) than backtracking is.
Sure. But the question is will Construction time + FBM + Verification be faster for an BNFA (backtracking NFA) than for a DFA? And will the DFA consume radically more memory than the BNFA? And my position is that most likely the DFA will win only on degenerate patterns. The rest of the time my feeling is that the BNFA will win hands down, mostly because of how cheap construction is.
It's great that Perl 5.10 is going to have a hot-swappable regular expression engine, because then maybe someone could build one that handles the true-regular-expression notation in guaranteed linear time and then Perl programmers could ask for it if they wanted a guarantee of predictable behavior.
Yes, this is exactly why I was keen on making the regex engine pluggable. We have seen one proof of concept for it, but its not well tested. It would be nice to see someone do a plug in that uses a different algorithm.
Even better, that would pave a way to having Perl check for the non-regular operators, and if they weren't there, choose the linear-time implementation automatically.
I agree, it would be nice to automatically swap out to a better algorithm under some circumstances. Assuming you did so only when the matching semantics were equivelent to that provided by leftmost-longest.
Thanks for writing the article. Regardless of the debate of backtracking NFA versus more DFA like structures I think it was well written and quite informative.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Perl regexp matching is slow??
by rsc (Acolyte) on Feb 02, 2007 at 13:17 UTC | |
by demerphq (Chancellor) on Feb 02, 2007 at 14:06 UTC | |
by rsc (Acolyte) on Feb 04, 2007 at 02:49 UTC | |
by demerphq (Chancellor) on Feb 04, 2007 at 10:22 UTC | |
by rsc (Acolyte) on Feb 04, 2007 at 20:24 UTC | |
| |
by xdg (Monsignor) on Feb 04, 2007 at 23:57 UTC | |
by rsc (Acolyte) on Feb 05, 2007 at 02:28 UTC | |
| |
by ysth (Canon) on Feb 04, 2007 at 03:06 UTC |