Your Marpa grammar has a problem. Marpa's Scanless Interface uses a two-level grammar: a simple and efficient grammar for the lexer which breaks up the source into tokens for the more versatile high-level grammar. Your grammar currently only uses the high-level interface, which leads to an incredible amount of ambiguity.

Rules in the lexer grammar are not declared with “::=” but with “~”. A rule declared in this way can either be used as a “terminal symbol” in the high-level grammar, or in another low-level rule. Let's rewrite your grammar accordingly. As a naming convention, I used CamelCase for rules in the high-level grammar, ALL_UPPERCASE for terminal symbols in the high-level grammar, and snake_case for other rules in the low-level grammar.

inaccessible is fatal by default lexeme default = latm => 1 :start ::= PlistFile PlistFile ::= VersionData Ows GlobalPlists VersionData ::= 'Version' WS FLOAT Ows ';' GlobalPlists ::= GlobalPlist+ GlobalPlist ::= GLOBAL_PL_DECLARE WS PL_NAME Ows OptPlOptions Ows '{' + Ows OptEmbeddedBase Ows Nodes '}' Ows OptPlOptions ::= Option* Option ::= '[' Ows OPTION_DATA Ows ']' Ows OptEmbeddedBase ::= EmbeddedBase* EmbeddedBase ::= '#' Ows 'base' Ows '=' Ows BaseNumbers BaseNumbers ::= BASE_NUMBER+ separator => COMMA Nodes ::= Node+ Node ::= Pattern | COMMENT | GlobalPlist || ReferencePlist Pattern ::= PAT_DECLARE WS PAT_NAME Ows OptPatOption ';' Ows OptT +agStr Ows OptPatOption ::= Option OptPatOption ::= OptTagStr ::= TagStr OptTagStr ::= TagStr ::= '#' Ows TagList Ows '#' TagList ::= TAG* separator => COMMA ReferencePlist ::= 'PList' WS RefPlName Ows ';' Ows RefPlName ::= OptRefFile PL_NAME OptRefFile ::= RefFile* RefFile ::= FILE_NAME ':' Ows ::= WS # a lexeme cannot have zero length, Ows ::= # so optional whitespace must be a high-level grammar feat +ure WS ~ ws COMMA ~ ',' COMMENT ~ '#' comment_chars [\n] | '#' comment_chars [\n] ws FLOAT ~ int | int '.' int BASE_NUMBER ~ int PL_NAME ~ identifier TAG ~ identifier PAT_NAME ~ identifier PAT_DECLARE ~ 'Pat' | 'Pattern' GLOBAL_PL_DECLARE ~ 'GlobalPList' | 'LocalPList' | 'PatternList' FILE_NAME ~ [\w.]+ OPTION_DATA ~ [\w \.,]* ws ~ [\s]+ identifier ~ [\w]+ comment_chars ~ [^\n]+ int ~ [\d]+

Notice also that a “*” quantifier repeats a rule, instead of making it optional. If you want to signal that a rule is optional, then add an empty production Rule ::=. Lexemes cannot have zero length, and must always consume characters.

As this grammar is tidied up and shoves as much as possible into the more efficient low-level grammar, it should also use less memory. However, two problems remain:


In reply to Re: Grammar based parsing methodology for multi MB strings/files (Marpa::R2/Regexp::Grammars) by amon
in thread Grammar based parsing methodology for multi MB strings/files (Marpa::R2/Regexp::Grammars) by tj_thompson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.