in reply to Re^2: Perl regexp matching is slow??
in thread Perl regexp matching is slow??
Well lets see, the title, and the lack of any kind of analysis of how a backtracking NFA can mitigate its weaknesses.
For instance the pattern used for the performance analysis could be optimised to /a{n,2n}/, and if written by a human probably would be. And the optimised construct would not exhibit the poor performance that he graphed out. In the discussion in the section marked "Real world regular expressions" he mentions that an NFA or DFA would do the opposite (unrolling the expression) but he doesnt discuss the impact of such a translation. Unrolling such a construct would result in an unmanagably large construction. A DFA that matches a{1,30000} is going to need at minum 30k states and transitions in it. Thats big. Now the same thing would be represented in a backtracking NFA like perls in what four regops, lets be generous and say that each regop is going to be 10 words (its more like 1 or 2 per regop), so we would need 40 words to represent that pattern in an backtracking-NFA. A DFA or a Thompsons NFA would 30k states, assuming each transition is only a word that is still a /huge/ difference in memory footprint, and in construction time. Memory for 30k nodes would have to be allocated blah blah. My bet is that before either a DFA or Thompsons NFA had finished construction the perl backtracking NFA would have already finished the match.
Another thing, the benchmarks showed only one side of things, the performance of an accepting match. How fast will thompson construction be when the match fails? Perls regex engine will reject the match in less than N characters comparisons because it will use FBM matching to locate the an segment. Its only when it matches that it behaves poorly, and that can be mitigated by using a better construct, which hypothetically can be optimised automatically at the engine level.
So its like the paper takes the tone "this is much better than that" and then glosses over all the hard stuff, doesnt mention the negative tradeoffs of the "better" solution, and gives no credit to the existing engines and the solutions they use, and makes no admission that there are optimisation techniques for working around the potential degenerate cases that might be encountered.
Update: tweaked some wording that wasnt clear
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Perl regexp matching is slow??
by samizdat (Vicar) on Jan 31, 2007 at 13:30 UTC | |
by demerphq (Chancellor) on Feb 01, 2007 at 08:24 UTC | |
by samizdat (Vicar) on Feb 02, 2007 at 11:56 UTC | |
Re^4: Perl regexp matching is slow??
by roboticus (Chancellor) on Jan 31, 2007 at 13:03 UTC |