Backtracking in Parse::RecDescent

ruscoekm has asked for the wisdom of the Perl Monks concerning the following question:

Given the following code:

use strict;
use warnings;
use Parse::RecDescent;

my $grammar = 'startrule: ( "aa" | "a" ) "a"';
my $parser  = Parse::RecDescent->new($grammar);
my $text    = 'aa';

print defined($parser->startrule($text)) ? "Good!\n" : "Bad!\n";
[download]

the output is "Bad!". The reason is that the first branch of the alternation matched, then the next subrule failed.

Now, I understand from the P::RD manpage that bottom-up parsers try all possible matches and select the longest one, so I guess the yacc equivalent would have worked. However, I still found the behaviour of P::RD a little surprising.

My question is: Is there any way to persuade a top-down parser like P::RD to accept the above text as valid? I know that, in this simple example, I could easily rewrite the grammar (either by saying ("a" | "aa" ) "a" or "aa" a" | "a" "a"). What I mean is: Is there any additional feature I have missed which would allow the grammar as is to parse the text successfully?

To put the question another way, can I get P::RD to behave more like a regex engine? After all, even an NFA engine would backtrack to try all possible alternatives before failing :-) (Perhaps parsers just do not backtrack past individual subrules under any circumstances.)

Comment on Backtracking in Parse::RecDescent Select or Download Code

Replies are listed 'Best First'.
Re: Backtracking in Parse::RecDescent by gjb (Vicar) on Dec 06, 2002 at 13:18 UTC
The behavior is quite okay. The point is that `( "aa" \| "a" )` and `"a"` represents two tokens, separated by whitespace. Hence '`aa a`' is valid input, as well as '`a a`', but '`aa`' is not since it's a single token and the grammar specifies two. Hope this clears the confusion, -gjb- Update/clarification: The assumption that " The reason is that the first branch of the alternation matched, then the next subrule failed." is simply wrong. The string 'aa' is not valid because it simply not part of the language described by the grammar. That language consists of two strings: 'aa a' and 'a a'. The grammar `( "aa" \| "a" ) "a"` specifies that an 'aa' should be followed by a 'a' token separated by whitespace or that an 'a' should be followed by an 'a' token separated by whitespace. The example you give 'aa' has no whitespace, hence only one token while your grammar requires two.	[reply] [d/l] [select]
Re: Re: Backtracking in Parse::RecDescent by ruscoekm (Monk) on Dec 06, 2002 at 13:33 UTC
I don't think that explains why the grammar `( "a" \| "aa" ) "a"` works. I believe that in P::RD, the default terminal prefix is optional white space. This is why `( "a" \| "aa" ) "a"` matches.	[reply] [d/l] [select]
Re: Backtracking in Parse::RecDescent (solution) by gjb (Vicar) on Dec 06, 2002 at 22:37 UTC
After digging into the Parse::RecDescent docs, I found out that the mechanism to alter Parse::RecDescent's preference for the first match is the `<score: ...>` directive. The code below illustrate how it solves your case. `use strict; use warnings; use Parse::RecDescent; my $grammar = q/ startrule: choice "a" choice: "a" "a" <score: -2> \| "a" <score: -1> /; my $parser = Parse::RecDescent->new($grammar); while (<DATA>) { chomp($_); print defined($parser->startrule($_)) ? "Good!\n" : "Bad!\n"; } __DATA__ a a a a a` [download] Now the result is "Good!" for both 'a a a' and 'a a' as desired Hope this helps, -gjb-	[reply] [d/l] [select]
Re: Re: Backtracking in Parse::RecDescent (solution) by ruscoekm (Monk) on Dec 06, 2002 at 22:50 UTC
Thanks, that's a good answer. However, it does not quite do what I want, which is for the parser to try all possible alternatives in a production before failing. I suspect that the answer is that top-down parsers just do not work that way.	[reply]
Re: Re: Re: Backtracking in Parse::RecDescent (solution) by Anonymous Monk on Dec 07, 2002 at 01:47 UTC
The answer is that RecDescent parsers do not work that way. They don't backtrack on failure; they just fail. Of course, there's nothing to prevent a recursive descent parser from incorporating backtracking too, but RecDescent doesn't. So, if you need backtracking in part of your grammar, you need to use plain old regexes there. Sorry. Damian	[reply]
Re: Re: Re: Re: Backtracking in Parse::RecDescent (solution) by ruscoekm (Monk) on Dec 09, 2002 at 08:15 UTC
Re: Backtracking in Parse::RecDescent by demerphq (Chancellor) on Dec 06, 2002 at 21:38 UTC
Please post this question to the Parse::RecDescent mailing list. `recdescent at perl.org` [download] UPDATE: Reason being that I think there is more likelyhood that TheDamian or someone equally informed will reply there. Also its an interesting question that should go in the P::RD mailing list archives. To whomever downvoted me on this: Get a life. --- demerphq my friends call me, usually because I'm late....	[reply] [d/l]
Re: Re: Backtracking in Parse::RecDescent by ruscoekm (Monk) on Dec 09, 2002 at 08:35 UTC
Thanks for the suggestion. I have posted the question (and the answer from Damian) on the P::RD mailing list.	[reply]
Re: Backtracking in Parse::RecDescent by pg (Canon) on Dec 06, 2002 at 16:48 UTC
Parse::RecDescent basicaly follows yacc. I know a very descent page on this topic, it might help: yacc.	[reply]
Re: Re: Backtracking in Parse::RecDescent by ruscoekm (Monk) on Dec 06, 2002 at 16:57 UTC
Thanks for the link, but I am not sure I understand you. lex is a lexer/tokener, while yacc is a parser. P::RD fulfils both functions. My reference to regex engines was only an analogy. The question concerns specifically the scope of backtracking in top-down parsers.	[reply]