I can't figure out how to get the grammar to properly differentiate between the Association, KeyList, and Scalar assignments within that clause.
You shouldn't have to. The parser will try them all until it finds one that succeeds. The problem is that you wrote it so the first production of Attr (AttrScalar(s?)) will always succeed, so it'll never get to try the 2nd and 3rd productions (AttrKeyList(s?) and AttrAssociation(s?)).
Instead of telling the parser to search for
0 or more of {one of {0 or more AttrScalar} or {0 or more AttrKeyList} or {0 or more AttrAssociation}}
you should be asking for
0 or more of {one of AttrScalar or AttrKeyList or AttrAssociation}
In other words,
Attr: AttrScalar(s?) | AttrKeyList(s?) | AttrAssociation(s?)
should be
Attr: AttrScalar | AttrKeyList | AttrAssociation
Your second problem is that "5" in "CounterInterval = 5" doesn't match "string".
You never check if you've reached the end of the string. That's why it returned a parse tree even though it was incomplete.
VCSConfig: statement(s)
should be
VCSConfig: statement(s) /\Z/
Are you sure that identifiers can't be one character long?
/[a-zA-Z][\w@]+/
should be
/[a-zA-Z][\w@]*/
and
/[a-zA-Z]\w+/
should be
/[a-zA-Z]\w*/
It's bad to separate a token into multiple rules. It causes characters to be removed. (See <skip.)
pathname: dquote(?) /([^"]+)/ dquote(?) { $item[1] }
should be
pathname: /"([^"]+)"/ { dequote($item[1]) }
| /([^"]+)/ { $item[1] }
Attr: AttrScalar | AttrKeyList | AttrAssociation
is *very* inefficient because all three subrules start with "attribute '='".
"cluster" name
will see the following string as valid
clusterpeanut
You normally want to force a space in there. One way is to match any identifier, than require the identifier to be "cluster".
This problem occurs in a few other places too.
It's very useful to uppercase tokens and keep them separate. They look similar to other rules, but you'll find that you'll be treating them a little special.
I find it much more readable to line up the : and the | of all the rules.
make_parser.pl, generates the parser. Run it to create VCSConfigParser.pm.
use strict; use warnings; use Parse::RecDescent qw( ); my $grammar = <<'EOG'; <autotree> { # These affect the entire parser. use strict; use warnings; sub dequote { my $s = $_[0]; $s =~ s/^"//; $s =~ s/"\z//; return $s; } } parse : stmt(s) /\Z/ stmt : clause | def clause : "include" pathname # Pathname may or may not be surrounded by double quotes pathname : STRING | BAREWORD def : IDENT def_[ $item[1] ] { $item[2] } def_ : { $arg[0] eq "cluster" ?1:0 } IDENT "(" attr(s?) ")" | { $arg[0] eq "system" ?1:0 } IDENT "(" attr(s?) ")" attr : ATTRNAME '=' attr_val attr_val : ident | string | number | key_list | assoc_list val : ident | string | number # These aren't inlined because of <autotree> ident : IDENT string : STRING number : NUMBER key_list : '{' <leftop: IDENT /[,;]/ IDENT> '}' assoc_list : '{' <leftop: key_value /[,;]/ key_value> '}' key_value : IDENT '=' val # === Tokens === IDENT : /[a-zA-Z]\w*/ { $item[1] } ATTRNAME : /[a-zA-Z][\w@]*/ { $item[1] } STRING : /"(?:[^"]+)"/ { dequote($item[1]) } NUMBER : /\d+/ { $item[1] } # Need work. BAREWORD : /(?:[^"]+)/ { $item[1] } EOG Parse::RecDescent->Precompile($grammar, 'VCSConfigParser') or die("Bad grammar\n");
test.pl, a sample program that uses the parser.
use strict; use warnings; use VCSConfigParser qw( ); use Data::Dumper qw( Dumper ); #$::RD_TRACE = ''; my $vcs_parser = VCSConfigParser->new(); my $vcs_config = do { local $/; <DATA> }; my $tree = $vcs_parser->parse( $vcs_config ); print Dumper $tree; __DATA__ include "types.cf" include "LBSybase.cf" include "OracleTypes.cf" cluster vcs ( UserNames = { vcs = X1Nh6WIWs6ATQ } Administrators = { vcs } CounterInterval = abc ) system njengsunvcs1 ( ) system njengsunvcs2 ( )
Notes:
def + def_ is an optimization of
def : IDENT { $item[1] eq "cluster" ?1:0 } IDENT "(" attr(s?) ")" | IDENT { $item[1] eq "system" ?1:0 } IDENT "(" attr(s?) ")"
By eliminating the common prefix of the productions, the parser is sped up.
If you want to allow nesting, change val to attr_val.
In reply to Re: Parse::RecDescent Grammar Questions
by ikegami
in thread Parse::RecDescent Grammar Questions
by gmarler
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |