I'm sure that demerphq will comment further, but the article glosses over important implementation details like for example Unicode, lookaround and backreferences, all of which are left as an exercise for the astute reader. This is all well for an academical paper but doesn't have much bearing on the things Perl does unless it's accompanied by working code.

The author choses to represent character classes as alternations, which is impractical with Unicode and even impractical memory-wise with a class like [^a] which uses at least 32 bytes of memory in ASCIIspace and all of the Unicode space otherwise.

Zero-width assertions are unimplemented and the author claims them as "hard but possible in general", and for backreferences the author even suggests to use two different engines because the machine cannot do backreferences.

I'm sure somebody with any insight into how regular expression engines work nowadays (and Perl's especially) can point out where the technique might be applicable, but I don't see how the method can replace the Perl regex engine as a whole.


In reply to Re: Perl regexp matching is slow?? by Corion
in thread Perl regexp matching is slow?? by smahesh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.