in reply to Marpa -- Partial grammar match does not result in failure
I am not so familiar with the Marpa parser-suite ... yet! ... but from my experience with other tools I find this production to be questionable:
data ::= many_a 'A'
many_a ~ [A]+
This appears to me to be a “head-recursive rule,” which would (might? .. here and elsewhere; see bullet-points at bottom) cause a parse to drop down a rabbit-hole. In other words, it consists of a production that will either (as the case may be) recurse endlessly, or match endlessly, followed by a production ('A') that now cannot be matched because what is supposed to be the terminal-token has already been consumed. If the rule calls for “a list of 'A's, followed by 'A',” then the first part will consume all of the 'A's, leaving no 'A' to match the final part of the rule, and so the entire rule will fail.
Contrast this to an equivalent “tail-recursive” rule, which I will now express in a nomenclature that is more-familiar to me:
<a_list> ::= 'A' <a_list>*This rule (in BNF) says that an <a_list> consists of the letter 'A', optionally followed by zero-or-more <a_list>s. The lexer will be smart enough to know what this really means and how to handle it efficiently. The key difference is that, as soon as that first letter 'A' is seen and matched, the rule is now accepted, no matter how many letter 'A's the token contains.
Like I said, I am not too familiar with the Marpa tool yet, although I will make it a point to do so. But it also seems to me that there might be a conceptual issue in your plans with regard to the respective role of the lexer and the parser in such a system. The lexer’s only purpose is to assemble a string of characters into a stream of tokens, which is then fed to the parser, which therefore never has to deal with characters. Some tools blur the distinction between these two concerns, but the two concerns are still there.
The “tail-recursive rule” issue will be present at both conceptual levels, lexer and parser. In some classic systems, like lex/yacc, an error in this area would cause for example the famous message, shift/reduce conflict.
Now, having said all that, I see the following notes on the MARPA website (http://jeffreykegler.github.io/Marpa-web-site/):
(!!)
- Marpa parses anything you can write in BNF, including ambiguous and even infinitely ambiguous grammars.
Marpa easily and efficiently handles both left- and right-recursion. If a grammar is in one of the classes in practical use today, Marpa parses it in O(n) (linear) time. Marpa never goes exponential. Worst case, even for wildly ambiguous grammars, is O(n3) (cubic) time.Marpa's run-time error detection is revolutionary. Marpa has complete situational awareness. It knows at all times which rules it is attempting to apply, how far it has progressed in them, and exactly which tokens it can accept. And Marpa can communicate its situational awareness back to the application. Marpa allows the application to efficiently retry rejected input. This, combined with its situational awareness, allows “Ruby Slippers” parsing: When an application has a token rejected, it can ask the parse engine which tokens would be acceptable. It can then create a virtual token that allows the parse to continue – the application can make the parse engine's "wishes" come true. The Ruby Slippers are easy to use and surprisingly wide in their application.
All in all, therefore, a remarkably-different parser, but also one that would seem to impose a lot of requirements to go along with its abilities. (The last two bullet-points, especially.) Verr-r-r-ry interesting . . . (And I find myself idly pondering ... do I want my parser to be “incredibly smart,” or would I prefer it to be fast-but-stupid?) :-/