The article is biased towards FA methods and mistreats backtracking by deliberately focusing on backtracking worst case while omitting FA ones. That is quite bad, since it on top in Google up to now, misleading people.

"This argument misses the point: given a choice between an implementation with a predictable, consistent, fast running time on all inputs or one that usually runs quickly but can take years of CPU time (or more) on some inputs, the decision should be easy." - this is simply wrong. No known (at least to me) regular expression maching algorithm have fast running time on ALL inputs.

Let's see how NFA will treat regular expressions like a{2,N} (using PCRE notation) where N is a large number - e.g. finite quantifier with large difference between left and right border. Backtracking match aaa in no time with this, while with big N NFA didn't even start matching, taking ages to build vast automaton. Like a{2,2000} etc. The graphs would be quite reversed.

I'm not a particular fan of backtracking, but that comparison isn't fair at all. Each regular expression matching algorithm has it's strong and weak points. And the only practical situation where these worst cases come to life I know of is DoS attacks on services that allows someone to supply regex.


In reply to Re: Perl regexp matching is slow?? by Anonymous Monk
in thread Perl regexp matching is slow?? by smahesh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.