Incognito has asked for the wisdom of the Perl Monks concerning the following question:

I have a pretty complicated grammar done with Parse::RecDescent (which has bugs I'm sure, but that's another story)... I can't stare at this stuff anymore... My eyes are getting tired. I've only been introduced to this stuff 2 days ago... and it's hard to figure out....

Basically, what this does is goes through line by line of var statement strings and returns the var names of each statement it finds.

Code

$grammar = q { statement: 'var' <leftop: expressions ',' expressio +ns> ';' { $return = $item[2] } expressions: var_name ('=' expression)(?) { $return = $item[1] } expression: conditional_operation | arithmetic_operation | equality | function_call | object_declaration | numeric_value | array_reference | array_value | object_reference | escapedRegex | escapedQuote | var_name operand: var_name | numeric_value arithmetic_operation: operand arithmetic_operator '(' arithmeti +c_operation ')' | operand arithmetic_operator arithmetic_o +peration | '(' operand arithmetic_operator operand +')' | operand arithmetic_operator operand | unary_negation_operator operand # + -12 | incremental_operator operand +# --j | operand incremental_operator +# i++ conditional_operation: equality '?' expression ':' expression | '(' equality '?' expression ':' expressi +on ')' comma_values: <leftop: expression ',' expression> + | <leftop: numeric_value ',' numeric_value +> array_reference: array_name '(' expression ')' | array_name '()' array_value: array_name '[' array_item ']' ('.' express +ion)(?) | '[' array_list ']' array_list: <leftop: array_item ',' array_item> array_item: var_name | integer object_reference: <leftop: object_name '.' expression> object_declaration: 'new' object_name '()' + | 'new' object_name '(' comma_values ')' + | 'new' object_name + object_name: 'Array' | 'Object' | 'Date' | /\w+/ function_call: function_name '()' | function_name '(' comma_values ')' function_name: /\w+/ condition: 'true' | '1' | 'false' | '0' | equality equality: '(' expression ')' | '(' expression equality_operator express +ion ')' var_name: /\w+/ array_name: /\w+/ numeric_value: real_number | integer integer: /\d+/ real_number: /\d+\.?\d*/ escapedRegex: '__REGEX__' escapedQuote: '__QUOTE__' # ARITHMETIC OPERATORS arithmetic_operator: '+' | '-' | '*' | '/' | '%' incremental_operator: '++' | '--' unary_negation_operator: '-' # OTHER OPERATORS string_operator: '+' | '+=' logical_operator: '&&' | '||' | '!' bitwise_operator: '&' | '^' | '|' | '~' | '<<' | '>>' | '>>>' equality_operator: '===' | '!==' | '==' | '!=' | '>' | '<' | '>=' | '<=' assignment_operator: '+' | '-' | '*' | '/' | '%' | '<<' | '>>' | '>>>' | '&' | '^' | '| +' assignshort_operator: '+=' | '-=' | '*=' | '/=' | '%=' | '<<=' | '>>=' | '>>>=' | '&=' | '^=' | '|=' }; $parser = new Parse::RecDescent ($grammar) or die "*** Bad grammar!\n" +; foreach my $localDeclaredVar (@localDeclaredVars) { my $refParsedValues = $parser->statement($localDeclaredVar) || print "*** $localDeclaredVar\n"; if (ref($refParsedValues) eq 'ARRAY') { foreach my $parsedValue (@$refParsedValues) { push (@localVariables, $parsedValue) if ($parsedValue); #print "==> [$parsedValue]\n"; } } else { push (@localVariables, $refParsedValues) if ($refParsedValues) +; #print "==> [$refParsedValues]\n"; } }

Things I'm having trouble with: not writing left-recursive code... I so much want to make this grammar simpler, but can't get my brain around writing the grammars using (s) and stuff.... For example, the "operand" rule I have should really just be an "expression" (actually a lot of these productions could be eliminated if I could solve this), but I don't know how to express this. My condition_operation is also pretty flawed...

Another thing I don't know how to do - allow expressions to be surrounded by optional parantheses '(' and ')'... I originally did this:

expression: '(' expression ')' | conditional_operation | arithmetic_operation | equality | function_call | object_declaration | numeric_value | array_reference | array_value | object_reference | escapedRegex | escapedQuote | var_name
but that didn't seem to work (I think this is the recursion issue). I've also noticed that the ORDER that I place these productions sometimes seems to matter. Yikes.... This grammer is nowhere near complete, but I've got it to a point where it parse almost all the sample input provided below:

Sample Input

var myTest1 = 1; var myTest2 = 2, myTest3 = 3, myTest4; var myTest5 = new Array(__QUOTE__,__QUOTE__), myTest6; var myTest7 =__REGEX__; var myTest8 = myTest5.x; var myTest9 = myTest[0], myTest10 = myTest[0]; var myTest11 = (myTest1 == myTest2); var myTest12 = (myTest1 == myTest2), myTest13 = 2; var myTest14 = (myTest1 == myTest2), myTest15; var myTest16 = new Array(1, 2); var myTest17, myTest18; var myTest19 = __QUOTE__+ strText +__QUOTE__; var myTest20 = getDateFromFormat(val,format); var myTest21 = new Object(); var myTest22 = getDateFromFormat(date2,dateformat2); var myTest23 = str.substring(i,i+x); var myTest24 = new Date(year,month-1,date,hh,mm,ss); var myTest25 = [__QUOTE__,__QUOTE__,__QUOTE__,__QUOTE__,__QUOTE__]; var myTest26 = [1, 2, 3, 4, 5]; var myTest27 = 4 + 5; var myTest28 = j++; var myTest29 = ++i; var myTest30 = -4; var myTest31 = 12 + (40 / 5); var myTest32 = 12 + 13; var myTest33 = 12 + 13 + 14; var myTest34 = (100 * 20); var myTest35 = document.all(__QUOTE__); var myTest36 = document.all(__QUOTE__).value; var myTest37 = document.all(__QUOTE__).value.toString(); var myTest38 = (myTest34); var myTest39 = (myTest34) ? 1 : 2; var myTest40 = a + ((b) ? 1 : 2); var myTest41 = arySubCookies[j].match(__REGEX__); var myTest42 = date.getYear() + __QUOTE__; var myTest43 = date.getMonth() + 1; var myTest44 = now.getMonth()+1;

Output

My grammar fails to parse only a few of the sample input we have here: it's to do with arithmetic_operations like "date.getYear() + __QUOTE__" and "now.getMonth()+1"... I want to make all operands an expression but that won't work...

*** var myTest40 = a + ((b) ? 1 : 2); *** var myTest42 = date.getYear() + __QUOTE__; *** var myTest43 = date.getMonth() + 1; *** var myTest44 = now.getMonth()+1;

I guess if I figured out how to rewrite this grammar with less rules and somehow getting around the left-recursion stuff then I should be doing okay... Does anyone see some things that can be corrected easily in this grammar?

Replies are listed 'Best First'.
Re: Grammar for JavaScript var statements using Parse::RecDescent - ARGH!
by princepawn (Parson) on Mar 01, 2002 at 21:21 UTC