comment on

Hello monks,

My ongoing Marpa adventures bring me back with some basic questions. I have traced it back to this test case:

use strict;
use warnings;

use Marpa::R2;
use Data::Dumper;

# Example 1
{
    my $g_str = <<'END_GRAMMAR';
:start      ::= data
data        ::= '12' '3'
END_GRAMMAR

    my $g_obj = Marpa::R2::Scanless::G->new({ source  => \$g_str });
    my $p_obj = Marpa::R2::Scanless::R->new({
        grammar => $g_obj,
        trace_values => 1,
        trace_terminals => 1
    });

    my $s = '12';
    $p_obj->read( \$s );

    print "EXAMPLE 1 VALUE:".Dumper($p_obj->value)."\n";
}

# Example 2
{
    my $g_str = <<'END_GRAMMAR';
:start      ::= data
data        ::= many_a 'A'

many_a        ~ [A]+
END_GRAMMAR

    my $g_obj = Marpa::R2::Scanless::G->new({ source  => \$g_str });
    my $p_obj = Marpa::R2::Scanless::R->new({
        grammar => $g_obj,
        trace_values => 1,
        trace_terminals => 1
    });

    my $s = 'AA';
    $p_obj->read( \$s );

    print "EXAMPLE 2 VALUE:".Dumper($p_obj->value)."\n";
}
[download]

In example 1, given that the input string is only '12', but the grammar specifies two tokens ('12' and '3'), I would expect the $p_obj->read call to fail to parse. However, no failure occurs.

I do see the first lexeme being accepted in the trace output. But I do not see the second '3' lexeme being accepted or rejected. The $p_obj->value result is undefined, which would make sense if the read did not complete.

If a '3' is appended to the input string, the $p_obj->value call then shows the expected return value.

In example 2, the first lexeme that will accept multiple 'A' characters matches both, leaving none for the second lexeme requiring a single 'A' character. The end result here seems to be the same. The $p_obj->read call does not fail, but the value ends up as undef.

This seems to imply a greedy match on the first lexeme that will not allow the second lexeme to match. However, I would again expect a grammar failure during the read call.

So on to my questions.

1) Why does the $p_obj->read call not fail in these two cases?
2) If the read does not fail, why is the value set to undef?
3) In example 2, is there a way to make the grammar match in a non greedy fashion?

As always, many thanks for the time and insight :)

EDIT: added in the read calls I left out from my copy and paste as Amon pointed out below.

EDIT2: As this question is very Marpa grammar/behavior specific, and less about Perl, I've posted a my below variant of this question based on feedback to the Google Marpa group here: https://groups.google.com/forum/#!topic/marpa-parser/fZzhxdBDbGk. I sincerely apologize if this is considered bad posting etiquette, but at this point I believe it's probably the more appropriate forum for the question. I will monitor both and ensure what I learn there is posted here in hopes that it will help others.

EDIT3: See the above link. Turns out assuming the input string matched the grammar because the read method did not throw an error is incorrect.

EDIT4: As for example 2 above, Jeffrey was able to offer some insight again. The lexer is greedy by nature (rules defined with '~'). The structural rules (defined with '::=' are much better at handling ambiguity. His solution was to move part of the logic into a structural rule. See post here https://groups.google.com/forum/#!topic/marpa-parser/6jgQj-MOLGM

In reply to Marpa -- Partial grammar match does not result in failure by tj_thompson

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.