campugnatus has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Monks!

I'm working with Parse::RecDescent and I'm facing a problem. At the first glance it seems like this problem was already covered in the official FAQ: How to commit in the optional subrule.

However, it doesn't seem to me like this trick does its job done. Here's the code to show what's wrong:

#!/usr/bin/perl use Parse::RecDescent; my $grammar = q( myrule : <rulevar: local $failed> myrule : 'stuff' mysubrule(?) <reject:$failed> | <error> mysubrule: 'ID' <commit> '[' ']' | <error?> { $failed++ } ); my $text = "stuff ID something"; my $parser = Parse::RecDescent->new($grammar) or die "hi"; my $tree = $parser->myrule($text); print $tree;

In this case parsing succeeds: mysubrule is committed, error directive is triggered but error message is never shown because mysubrule is optional in myrule.

If I change the $text variable to "stuff something", parsing fails: mysubrule isn't commited, error is not triggered, $failed is increased, and finally reject is triggered.

However from where I'm standing there's no sense in such behavior. What I'm trying to get it the opposite behavior where

Is there a way to achieve that?

I thought to ask the FAQ maintainer but its git repository is about two years old so I assume it is not maintained anymore. Thanks for your help and sorry for my English.

Replies are listed 'Best First'.
Re: Parse::Recdescent optional subrule commit
by Anonymous Monk on Apr 30, 2012 at 11:23 UTC

    I would say this is a bug, but I can't really be sure :)

    According to my reading of http://search.cpan.org/~jtbraun/Parse-RecDescent-1.967009/lib/Parse/RecDescent.pm#Rejecting_a_production, if you replace reject:$failed with the equivalent

    <reject: defined $failed> { print "WHAT!\n"; exit }

    The program will print "WHAT!" and exit

    So it could be that its a bug, or it could be the gotcha http://search.cpan.org/~jtbraun/Parse-RecDescent-1.967009/lib/Parse/RecDescent.pm#1._Expecting_an_error_to_always_invalidate_a_parse

    update: It probably is a bug, because the actual equivalent would return UNDEF on failure and not EXIT, and when I do that, the error propagates, so it is probably a bug in reject autohandler or something ... Hooray, I can't fix it :)

    #!/usr/bin/perl -- use Parse::RecDescent; my $grammar = q( myrule : <rulevar: local $failed> myrule : 'stuff' mysubrule(?) <reject: defined $failed> { print "WHAT +($failed)!\n"; undef } | <error> mysubrule: 'ID' <commit> '[' ']' | <error?> { $failed++ } ); my $parser = Parse::RecDescent->new($grammar) or die "hi"; for my $text ( "stuff ID something", "stuff something", ){ print "text => $text\n"; my $tree = eval { $parser->myrule($text) }; warn $@ if $@; use Data::Dump qw/ dd /; dd $tree; } __END__ text => stuff ID something WHAT ()! ERROR (line 1): Invalid mysubrule: Was expecting '[' but found "something" instead ERROR (line 1): Invalid myrule: Was expecting 'stuff' undef text => stuff something ERROR (line 1): Invalid myrule: Was expecting 'stuff' undef

    update: It think this might be related issue, Bug #62892 for Parse-RecDescent: failed subrules eat text

      Thanks for your attention!

      I don't see anything that looks like a bug here. Your program prints "WHAT ()!" and that is exactly what it should print when parsing "stuff ID something". Due to presence of 'ID' after 'stuff', mysubrule gets committed, therefore not increasing $failed but triggering <error?> instead.

      Anyway, that is not the case. As far as I see, my example works exactly as it is written, no unexpected behavior. However, that behaviour is not the one that solves the question that bothers me and the one asked in the FAQ.

      This is not even about <reject>, it is about the fact that $failed is increased only when mysubrule is NOT committed. And that seems just pointless! I think that $failed should only be increased when mysubrule gets committed and fails afterwards. I.e. <error?> and {$failed++} should happen simultaneously, and only that way. And I don't see any way to make that happen.

        I suspected I was completely off base :)

Re: Parse::Recdescent optional subrule commit
by locked_user sundialsvc4 (Abbot) on Apr 30, 2012 at 14:08 UTC

    My expectation is that the package definitely is maintained, and I believe that there is an error in your assumptions about the grammar.   Not trying to delve too deeply into this (but having spent more than a year working magic with this package), if your grammar says that myrule consists of 'stuff' optionally followed by mysubrule, and the string begins with the token 'stuff' 'ID', then parsing has succeeded with just those tokens present:   you said to <commit> after reading 'ID', and the rule did not fail until thereafter.   You said also that the rule was optional, thereby saying that rule-failure was okay:   the failure of an optional subrule means that it is not present, and your rule for myrule contains no subsequent condition that must be met.   (This, BTW, is why programming-language statements are traditionally obliged to end with a semicolon.   Generally speaking, when something is “optional,” you ought to stipulate that something that is not optional must follow it.

    What you probably intend to do is to say that myrule : 'stuff' something_else, where something_else consists of several alternatives, one of which is the (now, non-optional) subrule, among others (the last of which might be an error-case).   You are now saying, as I think you always intended to be saying, that the grammar must match one of the available alternatives, or declare that an error has occurred.   I do not think that you actually mean for the rule to be “optional” at all.

    Turning on the tracer is pretty much mandatory when using this module, and its tracer is quite generous.   It will show you, in rather exhaustive detail, what it thought that your grammar meant.   Undoubtedly, this will not be what you intended for that grammar to mean, but it will be a valid interpretation nonetheless.

    A recursive-descent parser has its “certain peculiarities.”   And grammars, in general, are always a challenge.   Really, the only way to rigorously test them is to build up a rather extensive Test::Most testing suite that it must pass, and to build up and maintain that testing suite as you develop your grammar ... using git or some other version-control system as you go.

        Of that, I do not know.   What I do know is that I have asked impossible things of that module and it did every single thing that I asked with grace and style.   (Parsing hundreds of SAS® programs and TWS® schedule files and Korn shell scripts ... oh my!)

        I sincerely think that the problem lies in your grammar and that a trace output will reveal the answer you seek.   I have scratched my head in a very similar fashion many, many a day.

      Hello! It becomes clear now that I have some problems in speaking my thoughts :)

      the failure of an optional subrule means that it is not present

      That is the way Parse::RecDescent thinks of it indeed. But from a human point of view we can say that failure of a rule can also mean that it IS present but contains some errors that prevent it from matching. And using a <commit> directive is a good way to inform the parser that "This production is present, I'm sure of it. If it will not match then it apparently contains an error, and you should fail without trying other alternatives as they will not match a fortiori". The only problem with <commit> is that it only works on other productions of the same rule and has no effect on subrules. That trick in FAQ is a try to extend the power of <commit> on subrule calls.

      You are now saying, as I think you always intended to be saying, that the grammar must match one of the available alternatives, or declare that an error has occurred. I do not think that you actually mean for the rule to be “optional” at all

      I'm sorry for my question being so ambiguous. Once again, as you can see in FAQ, that is not a try to question mysubrule's optionality but to avoid obligatory subsequent conditions that could be very hard to write in complicated grammars. I totally understand how does this grammar from my example work. It works correctly. I just want to tweak it in such a manner that discovering an 'ID' would mean that this optional mysubrule is actually present and its failure would definitely indicate a mistake in the text being parsed.

        It offhand seems to me that the behavior you might be seeking is one thing that a recursive-descent parser really does not do:   it does not “back up.”   A parser that is based, say, on Yacc or Bison-type technology can explore an avenue, find it to be fruitless, and then fall back several levels in quest of something else.   An RD parser does that sort of thing with much more difficulty, or not at all.   In particular, if you have used commit then you have cut-off any fallback beyond that point; even if the grammar-rule containing it subsequently “fails.”   The call has already been made:   its presence in the grammar is not a so-called “promise,” but an actual reality that occurs when the element is encountered.

        There are other parser technologies available in the Perl environment, and it may well be that your particular requirements would be more suitable to one of them.   Otherwise, the grammar structures that you must use (such as ones which I suggested) may not entirely represent your conceptual notion of how the language is put together ... but are, in part, shaped by the characteristics of the RD parser.   I encountered this many times in my previously mentioned major parsing project.

Re: Parse::Recdescent optional subrule commit
by ikegami (Patriarch) on May 02, 2012 at 19:46 UTC

    Making the subrule mandatory, while allowing it to match nothing solves the problem.

    #!/usr/bin/env perl use strict; use warnings; use feature qw( say ); use Parse::RecDescent qw( ); my $grammar = <<'__EOI__'; myrule : 'stuff' mysubrule | <error> mysubrule : 'ID' <commit> '[' ']' | <error?> <reject> | __EOI__ my $parser = Parse::RecDescent->new($grammar) or die; for my $text ("stuff ID something", "stuff something") { say "===================="; say "$text"; say "--------------------"; say $parser->myrule($text) ? 'pass' : 'fail'; }
    ==================== stuff ID something -------------------- ERROR (line 1): Invalid mysubrule: Was expecting '[' but found "something" instead ERROR (line 1): Invalid myrule: Was expecting mysubrule but fou +nd "ID something" instead fail ==================== stuff something -------------------- pass

      And below is how to do it with the optional subrule. It even provides better error messages.

      #!/usr/bin/env perl use strict; use warnings; use feature qw( say ); use Parse::RecDescent qw( ); my $grammar = <<'__EOI__'; myrule : <rulevar: local $failed = 1> myrule : 'stuff' mysubrule(?) <reject:$failed> mysubrule : 'ID' <commit> '[' ']' | <error?> { $failed = 0; } <reject> __EOI__ my $parser = Parse::RecDescent->new($grammar) or die; for my $text ("stuff ID something", "stuff something") { say "===================="; say "$text"; say "--------------------"; say $parser->myrule($text) ? 'pass' : 'fail'; }
      ==================== stuff ID something -------------------- ERROR (line 1): Invalid mysubrule: Was expecting '[' but found "something" instead fail ==================== stuff something -------------------- pass
        This is elegant and simple! Thank you!