gmarler has asked for the wisdom of the Perl Monks concerning the following question:
Finally spending time working on a Parse::RecDescent grammar, but am having trouble with low level productions that are very similar, so the wrong one is often picked, causing the parse to fail. Reading the docs for the module indicates that maybe using <score: ...> might be useful, but I'm not clear on how I could take advantage of it.
The issue seems to be that the statements I'm trying to parse take one of three forms:
Here's example code I've got, with __DATA__ at the end:
use strict; use warnings; use Parse::RecDescent; use Data::Dumper; my $grammar = <<'EOG' <autotree> VCSConfig: statement(s) statement: clause | def clause: "include" pathname # Include Clause # NOTE: May not have any attributes... def: "cluster" name "(" Attr(s?) ")" | "system" name "(" Attr(s?) ")" # Pathname may or may not be surrounded by double quotes pathname: dquote(?) /([^"]+)/ dquote(?) { $return = $1; } dquote: /"/ name: /\w+/ Attr: AttrScalar(s?) | AttrKeyList(s?) | AttrAssociation(s?) AttrScalar: attribute '=' string AttrKeyList: attribute '=' keylist AttrAssociation: attribute '=' association attribute: /[a-zA-Z][\w@]+/ # allow '@' in attr name # NOTE: separator can be either of ',' or ';' keylist: '{' <leftop: string /[,;]/ string> '}' association: '{' <leftop: key_value /[,;]/ key_value> '}' key_value: string '=' string string: /[a-zA-Z]\w+/ EOG my ($vcs_config); my ($vcs_parse) = Parse::RecDescent->new( $grammar ); my ($vcs_config) = do { local $/; <DATA>; }; my ($orig_config) = $vcs_parse->VCSConfig( $vcs_config ); print Dumper $orig_config; __DATA__ include "types.cf" include "LBSybase.cf" include "OracleTypes.cf" cluster vcs ( UserNames = { vcs = X1Nh6WIWs6ATQ } Administrators = { vcs } CounterInterval = 5 ) system njengsunvcs1 ( ) system njengsunvcs2 ( )
The include clauses are parsed with no problem, but as soon as I hit the cluster clause, everything starts to break down, because I can't figure out how to get the grammar to properly differentiate between the Association, KeyList, and Scalar assignments within that clause.
Would the <score: ...> directive help me here? Or is there a much simpler way to get the grammar in line?
Note that this is just a small snippet of the config file in my example - the actual file I'm trying to parse is hundreds of lines long and has several other clause types, but they all have the same attribute types I'm trying to parse here - so this isn't really an easy regex problem either.
|
|---|