The big difference between these two is whether you think regular expressions should be treated as descriptive (here's the effect I want) or prescriptive (here's how I want you to do it).

Well, both are at play in the perl engine. If it was fully prescriptive then the engine wouldnt be able to to do things like using fixed string optimisations.

Then it would be plenty safe to take a regexp from the user and use it to run a search. But with the current Perl regexes, no way.

Doesnt this criticism apply equally to Thompsons algorithm or to DFA construction? I would have thought the only difference would be that in a DFA youd see performance issues with compilation and not execution.

And just to be clear, that is no more incompatible with whatever optimizations you might add (like exact string search) than backtracking is.

Sure. But the question is will Construction time + FBM + Verification be faster for an BNFA (backtracking NFA) than for a DFA? And will the DFA consume radically more memory than the BNFA? And my position is that most likely the DFA will win only on degenerate patterns. The rest of the time my feeling is that the BNFA will win hands down, mostly because of how cheap construction is.

It's great that Perl 5.10 is going to have a hot-swappable regular expression engine, because then maybe someone could build one that handles the true-regular-expression notation in guaranteed linear time and then Perl programmers could ask for it if they wanted a guarantee of predictable behavior.

Yes, this is exactly why I was keen on making the regex engine pluggable. We have seen one proof of concept for it, but its not well tested. It would be nice to see someone do a plug in that uses a different algorithm.

Even better, that would pave a way to having Perl check for the non-regular operators, and if they weren't there, choose the linear-time implementation automatically.

I agree, it would be nice to automatically swap out to a better algorithm under some circumstances. Assuming you did so only when the matching semantics were equivelent to that provided by leftmost-longest.

Thanks for writing the article. Regardless of the debate of backtracking NFA versus more DFA like structures I think it was well written and quite informative.

---
$world=~s/war/peace/g


In reply to Re^2: Perl regexp matching is slow?? by demerphq
in thread Perl regexp matching is slow?? by smahesh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.