in reply to Marpa -- Partial grammar match does not result in failure

First, let us realise that your examples don't actually perform a ->read(\$s), but we'll just assume it's somewhere in there.

Secondly, let's be clear about the behaviour of read() and value(): read() returns the position in the input stream where scanning ended. It will throw an error if the input couldn't be parsed e.g. because no expected lexeme matched the input at the current location. It will not throw an error if the input simply has been exhausted. That is, it only throws an error if there was a problem with respect to the low-level grammar which represents the lexer.
The value() method will return either undef if there was no successful parse, or a reference to the result if the parse was successful. As you do not currently specify any semantics, the result would be undef, so we'd end up with a reference to undef in the case of a successful parse. That is, this indicates the success of the high-level grammar which does the actual parsing.

Now let's look at the first example. Here, the input is accepted by the low-level grammar, as the token '12' can be matched. Therefore, read() succeeds. However, the high-level grammar also requires the '3' token, which the input doesn't contain. As the input has ended, the high-level grammar will fail, and value() will be undef, indicating a parse failure.

Now the second example. Here, the low-level grammar provides the lexemes many_a ~ [A]+ and the anonymous 'A'. Due to longest-token matching, the input 'AA' will be completely consumed by the many_a lexeme. This token is accepted by the high-level grammar. Now the input is exhausted, so the token 'A' will not be found.

For efficiency, Marpa never backtracks like Perl regexes do. Especially, Marpa's low-level grammar is equivalent to regular expressions (the computer science thing, not Perl regexes), which can be recognized in linear time. I.e., this is efficient as hell, but not very expressive. If you need more features, you must use the high-level grammar and resolve the ambiguity there, or switch to manual lexing (via pause events … but you'll come to that later).

If you want a successful parse to have a more interesting value(), just add a :default ::= action => [name,values] to your grammar, which will create an array ref for each rule.

Replies are listed 'Best First'.
Re^2: Marpa -- Partial grammar match does not result in failure
by tj_thompson (Monk) on May 08, 2014 at 04:33 UTC

    First, a personal thanks for taking the time to respond Amon. It's clear you have a strong understanding of Marpa and your response to this question and my previous questions have been exceedingly useful. I have been reading through the documentation but I am finding it difficult to find what I need as there is much to absorb.

    You are correct that the 'read' call was in my code, it was missed in my copy/paste. I have edited the main post to reflect this change.

    I had assumed that the behavior here was just something that I did not understand, not that Marpa was incorrect or faulty. Your response confirms this. So a couple of followup questions.

    1) Given this information, it would seem to me that you should be able to differentiate between the parser exhausting the input as opposed to successfully parsing the entire grammar. Is there some way to determine this?

    After a bit of thought, I'm assuming that determining what constitutes 'parsing the entire grammar' would be a difficult problem in and of itself considering the way a parse tree is created. Exhausting the input string is probably a reasonable indicator of a grammar having been parsed successfully.

    2) How would you resolve the problem in example 2 assuming this was part of a more complex grammar? Considering you are locked into a greedy match behavior on a per token basis, this seems difficult to resolve.

      It appears that it is possible to differentiate a failed match vs. an exhausted input string. If the recognizer specifically rejects the string as unable to match, it will throw an error. If the input string is exhausted, but has conformed to the grammar requirements up till that point, the recognizer instead pauses the parse. This allows for the opportunity to add to the input string and resume parsing.

      However, if the full grammar has not been matched and the recognizer is paused, the 'value' method will result in an undef return. An undef return from the 'value' method means there was no parse result, or a failure.

      This is pretty much exactly what Amon was telling me above but I didn't quite put things together until I understood that you could pause the parsing, add to your input, and resume parsing.

      Short story is don't rely on an error from the recognizer to determine whether the grammar was matched. An error clearly indicates a failed match, but no error can mean 'input string was exhausted, but does not yet match fully' or 'grammar was matched successfully'. To differentiate these, ensure the value returned from the 'value' method is defined. Thanks to the Marpa author Jeffrey for the insight on this point https://groups.google.com/forum/#!topic/marpa-parser/fZzhxdBDbGk

Re^2: Marpa -- Partial grammar match does not result in failure
by tj_thompson (Monk) on May 08, 2014 at 16:50 UTC

    After some further testing, I still do not see the value being updated in the case of the partial (but apparently successful) parse. I have added the :default rule you suggest above to two cases. Example 1 has been modified to match all tokens in the grammar( '12' and '3' ). It does return the expected 'name, value' structure. However, Example 2 retains the original string that only matches the first token ('12'). If the parse was successful, I would expect at least the matched tokens to appear in the value data, but it does not appear to exist. I have taken the time to read through the read and value documentation you linked above, but this information does not seem to explain the results I see here (or at least I'm not making sense of it).

    use strict; use warnings; use Marpa::R2; use Data::Dumper; # Example 1 { my $g_str = <<'END_GRAMMAR'; :default ::= action => [name, value] :start ::= data data ::= '12' '3' END_GRAMMAR my $g_obj = Marpa::R2::Scanless::G->new({ source => \$g_str }); my $p_obj = Marpa::R2::Scanless::R->new({ grammar => $g_obj, trace_values => 1, trace_terminals => 1 }); my $s = '123'; $p_obj->read( \$s ); print "EXAMPLE 1 VALUE:".Dumper($p_obj->value)."\n"; } # Example 2 { my $g_str = <<'END_GRAMMAR'; :default ::= action => [name, value] :start ::= data data ::= '12' '3' END_GRAMMAR my $g_obj = Marpa::R2::Scanless::G->new({ source => \$g_str }); my $p_obj = Marpa::R2::Scanless::R->new({ grammar => $g_obj, trace_values => 1, trace_terminals => 1 }); my $s = '12'; $p_obj->read( \$s ); print "EXAMPLE 2 VALUE:".Dumper($p_obj->value)."\n"; }

    Output:

    Setting trace_terminals option Setting trace_values option Lexer "L0" accepted lexeme L1c1-2: '12'; value="12" Lexer "L0" accepted lexeme L1c3: '3'; value="3" EXAMPLE 1 VALUE:$VAR1 = \[ 'data', '12', '3' ]; Setting trace_terminals option Setting trace_values option Lexer "L0" accepted lexeme L1c1-2: '12'; value="12" EXAMPLE 2 VALUE:$VAR1 = undef;

    In example 2, given there is now a :default rule for returning match data from a structural rule, and that the parse was successful, why is there no output from value?

    My guess is that since the 'data' token requires both '12' and '3' to match, 'data' was not successfully matched. This means there is no returned value for the token, resulting in an empty value data structure. But this sounds an awful lot like a grammar parse failure that should result in a ->read failure.

    EDIT: As it turns out, I am making the faulty assumption that because the read method did not throw an error, the parse was successful. This is not necessarily true. See my post above Re^3: Marpa -- Partial grammar match does not result in failure