Well, I think O() notation breaks down for recursive descent, especially in the absence of information about the grammar and the kind of input expected. Certainly it can't do better than O(I), but it doesn't have to do a lot worse unless you're planning to do a lot of backtracking. As with Perl 5 regex behavior, you're going to do much better with patterns that succeed than with patterns that fail. I suspect memoization is only going to help you if you're in the situation of failing frequently. After all, if you're succeeding, you're changing which token you're looking at, and that will by definition blow up any memoization cache because you're changing your inputs.

For recursive descent without backtracking, you're going to pay in only two dimensions, one of which is the length of the input, but the other of which is the depth of your grammar, modulated by how often your input forces you to traverse that depth. Your typical computer language forces you down through all the levels of precedence on every term.

Now, as it happens, Perl 6's grammar is a fairly deep grammar in terms of precedence levels, so that would be a big hit for a pure recursive descent approach. Fortunately, as chromatic pointed out, we're not really purists. So our approach is to try to have our cake and eat it too. We'll use recursive descent as the default semantics, but allow various declarative ways of modifying that. In particular, for parsing Perl 6 itself, we'll intermix an operator precedence grammar for expression parsing, and just use recursive descent for large scale structures as well as individual tokens that contain expressions. That should give us about a 20-times speedup over pure recursive descent, give or take a few orders of magnitude.

This approach also allows us to insert new precedence levels on the fly if the user requests them. When a user adds a macro or an operator, they might not even realize they're changing the Perl grammar on the fly.

But even leaving macros and such out of it, we're also trying to keep the whole ordinary rule syntax declarative enough that you could use a pragma at the top of your grammar to install some other kind of grammar engine, LALR or whatever. As long as the effects are limited to the current lexical scope, that's fine. "All is fair if you predeclare."

But the basic approach remains recursive descent in flavor, because for all its faults, recursive descent can give you very good error messages. Another hybrid notion that we'll likely be exploring is to go ahead and do a fast bottom-up parse in spots, but when something goes haywire syntactically, go ahead and backtrack a little to try a recursive descent parse, not to try and second-guess the bottom-up parse and "fix up" the user's code, but rather to try and determine where the user went wrong, and give some guidance. Epistemologically speaking, a recursive descent parse can keep track of the dubious decision points of the user and see if a different hypothetical decision would have resulted in a successful parse. If so, we don't actually do any code generation--the code is wrong, after all--but just whack the user upside the head with a more clueful cluebat than we could have with a bottom-up parse. The problem with most bottom-up parsers is they can't tell you anything much more specific than "Something's wrong at line 42."


In reply to Re: Perl 6 rules and complexity by TimToady
in thread Perl 6 rules and complexity by fergal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.