Hello monks,

My ongoing Marpa adventures bring me back with some basic questions. I have traced it back to this test case:

use strict; use warnings; use Marpa::R2; use Data::Dumper; # Example 1 { my $g_str = <<'END_GRAMMAR'; :start ::= data data ::= '12' '3' END_GRAMMAR my $g_obj = Marpa::R2::Scanless::G->new({ source => \$g_str }); my $p_obj = Marpa::R2::Scanless::R->new({ grammar => $g_obj, trace_values => 1, trace_terminals => 1 }); my $s = '12'; $p_obj->read( \$s ); print "EXAMPLE 1 VALUE:".Dumper($p_obj->value)."\n"; } # Example 2 { my $g_str = <<'END_GRAMMAR'; :start ::= data data ::= many_a 'A' many_a ~ [A]+ END_GRAMMAR my $g_obj = Marpa::R2::Scanless::G->new({ source => \$g_str }); my $p_obj = Marpa::R2::Scanless::R->new({ grammar => $g_obj, trace_values => 1, trace_terminals => 1 }); my $s = 'AA'; $p_obj->read( \$s ); print "EXAMPLE 2 VALUE:".Dumper($p_obj->value)."\n"; }

In example 1, given that the input string is only '12', but the grammar specifies two tokens ('12' and '3'), I would expect the $p_obj->read call to fail to parse. However, no failure occurs.

I do see the first lexeme being accepted in the trace output. But I do not see the second '3' lexeme being accepted or rejected. The $p_obj->value result is undefined, which would make sense if the read did not complete.

If a '3' is appended to the input string, the $p_obj->value call then shows the expected return value.

In example 2, the first lexeme that will accept multiple 'A' characters matches both, leaving none for the second lexeme requiring a single 'A' character. The end result here seems to be the same. The $p_obj->read call does not fail, but the value ends up as undef.

This seems to imply a greedy match on the first lexeme that will not allow the second lexeme to match. However, I would again expect a grammar failure during the read call.

So on to my questions.

As always, many thanks for the time and insight :)

EDIT: added in the read calls I left out from my copy and paste as Amon pointed out below.

EDIT2: As this question is very Marpa grammar/behavior specific, and less about Perl, I've posted a my below variant of this question based on feedback to the Google Marpa group here: https://groups.google.com/forum/#!topic/marpa-parser/fZzhxdBDbGk. I sincerely apologize if this is considered bad posting etiquette, but at this point I believe it's probably the more appropriate forum for the question. I will monitor both and ensure what I learn there is posted here in hopes that it will help others.

EDIT3: See the above link. Turns out assuming the input string matched the grammar because the read method did not throw an error is incorrect.

EDIT4: As for example 2 above, Jeffrey was able to offer some insight again. The lexer is greedy by nature (rules defined with '~'). The structural rules (defined with '::=' are much better at handling ambiguity. His solution was to move part of the logic into a structural rule. See post here https://groups.google.com/forum/#!topic/marpa-parser/6jgQj-MOLGM


In reply to Marpa -- Partial grammar match does not result in failure by tj_thompson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.