In order to satisfy that "descriptive" you would have to build DFA if for no other reason then to make all |-s equal and to deduce non-greedy multipliers. Just to remind you that once you cross the Rubicon to "descriptive" non-greedy intent has to be deduced.
So, at the very least, you would incur perf hit all the time for the sake of a fringe case. Doesn't really matter if you would hide it behind the dynamic building. So if the DFA has the exponential explosion of states you would eventually reach the point of explosion. Sorry I can't find the one that Ullman used to throw at unsuspecting souls :-), but I'm sure you'll find enough of these on the net ...
You might get away with a stuff like "on the fly" DFA for something that is one-time run, but in such case the whole debate is artificial and academic.
For a real world run, when say I have a regex compiled all the way to the binary, say in .NET, and am paying the price of never being able to "unload" it for the lifetime of the binary, and the binary happens to be a server required to hit 99.9999 and is running that regexp on all incoming text ... I think you get the picture :-) and what's the purpose of non-greedy multipliers and all these assertions you dislike so much and why even a thought that someone might be building an DFA on the fly is not a realistic option. Even if it could handle back-references, or more importantly a recursive parse with implicit stack - which it can't.
See, we actually need the interpretative, ahem interpretation :-) of the regex and the one that is stable across languages and platforms - thereby the PCRE and the only real "price" we "pay" is that we use non-greedy multipliers and we are done. Most horrible thing, isn't it :-)
Now we could debate that we might prefer to have non-greedy as a default but that's a historical note :-) Also a matter of how precisely one wants to define the input he'll take in. Some people may insist on using negative classes instead, but they don't write regexps to match a in a :-).
As for the back-refs and recursive parse, that's going to be more in demand, not less. For example .NET implementation of Perl regex was designed to be able to parse XML without a perf hit and while I couldn't say if Perl would take a perf hit for something like that, it definitely can parse XML with regex much easier then .NET.
So, you can go cursing everyone, starting with Thompson himself :-) or you can kind of come to terms with the fact that regex engines in widespread use are practical tools and not academic toys. So, claiming that one engine is "slow" because it will have a problem with a ridiculous thing like a?{3}a{3} on "aaa" but will run a{0,30}?a{30} without a hiccup looks like the ultimate exercise in futility. If for some reason you really, honestly need to match something like that, well you can always write a regex, to transform "a?{n}a{n}" into "a{0,n}?a{n}" :-)) Might need back-refs to handle more convoluted cases though :-)))))
In the meantime I will happily cherish Perl/PCRE/.NET etc. regex in all it's universal presence and benefits it brings - for all people with real needs for fast string parsing.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.