Ok's, I've gotchas now tilly.

I understand the difference between NFA's and DFA's, in fact I think it would be nice if you could have a DFA-preference modifier in Perl on your regex's so that if they are DFA-compatible, they would use the DFA instead of an NFA. (or even as egrep does, -pretend- they are DFA compatible, then check your answer with an NFA).

I'm no wiz with regex's though, but what you propose seems very, very difficult! At least, it seems that if you told it to do even something simple like: /vr(o*)*m/ (which if I'm thinking correctly would cause a normal NFA to go ga-ga), you would be trying to get the regex engine to realize first that perhaps we should either ignore the double *, or (as it would be in more complicate situations) reverse the processing order of the *'s, or even something more 'odd'.

And so my point is, even if you can get the regex engine to realize it can do something to avert catastrophe, couldn't many of the 'solutions' that the regex decides upon change which match the NFA chooses? That is, changed from what the original badly written form asked for? The whole intent on using an NFA is so you can choose optimized or taylored regex's that either give you speed or choice in your matches (and sometimes both :) ). But wouldn't this sort of optimization make which match the NFA picks sort of, non-obvious? (in a more non-obvious than normal for regex's way)?

Again, I'm not an expert regex'er, but it seems using a DFA in these sorts of situations (even if a DFA can't be used directly), would be similar -- you would avoid the pitfalls of killing the performance, but you suffer in that maybe you're not getting the match you expected.

Well, I'm probably brain-fried right now, so don't pay any attention to me :) .

Ciao,
Gryn

p.s. for those that are in the dark the main difference between a NFA and a DFA is one of performance and control.

A DFA takes almost always the same amount of time to make a match, and it will always make the same one for equivalent regex's (the longest of the leftmost (er, or the otherway around, sorry)).

A NFA is the opposite, it may (often) be much fater than a DFA, but it also may get stuck in a rut and finish sometime after the Sun burns out. Besides the speed improvement, you often get a control advantage. That is, for equivalent regex's (ones that would match the same thing, but don't look exactly the same because of reordering or what-not) you can control which match you want by rearranging your regex (which can't be done for DFA's).


In reply to Re: Re (tilly) 3: Research ideas by gryng
in thread Research ideas by Wodin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.