comment on

Well lets see, the title, and the lack of any kind of analysis of how a backtracking NFA can mitigate its weaknesses.

For instance the pattern used for the performance analysis could be optimised to /a{n,2n}/, and if written by a human probably would be. And the optimised construct would not exhibit the poor performance that he graphed out. In the discussion in the section marked "Real world regular expressions" he mentions that an NFA or DFA would do the opposite (unrolling the expression) but he doesnt discuss the impact of such a translation. Unrolling such a construct would result in an unmanagably large construction. A DFA that matches a{1,30000} is going to need at minum 30k states and transitions in it. Thats big. Now the same thing would be represented in a backtracking NFA like perls in what four regops, lets be generous and say that each regop is going to be 10 words (its more like 1 or 2 per regop), so we would need 40 words to represent that pattern in an backtracking-NFA. A DFA or a Thompsons NFA would 30k states, assuming each transition is only a word that is still a /huge/ difference in memory footprint, and in construction time. Memory for 30k nodes would have to be allocated blah blah. My bet is that before either a DFA or Thompsons NFA had finished construction the perl backtracking NFA would have already finished the match.

Another thing, the benchmarks showed only one side of things, the performance of an accepting match. How fast will thompson construction be when the match fails? Perls regex engine will reject the match in less than N characters comparisons because it will use FBM matching to locate the aⁿ segment. Its only when it matches that it behaves poorly, and that can be mitigated by using a better construct, which hypothetically can be optimised automatically at the engine level.

So its like the paper takes the tone "this is much better than that" and then glosses over all the hard stuff, doesnt mention the negative tradeoffs of the "better" solution, and gives no credit to the existing engines and the solutions they use, and makes no admission that there are optimisation techniques for working around the potential degenerate cases that might be encountered.

Update: tweaked some wording that wasnt clear

---
$world=~s/war/peace/g

In reply to Re^3: Perl regexp matching is slow?? by demerphq
in thread Perl regexp matching is slow?? by smahesh

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.