http://qs1969.pair.com?node_id=553889

Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent

Synopsis

I have found documentation on eliminating left-recursion (such as Eliminating Left Recursion in Parse::RecDescent) to be unsatisfactory. Left recursion is usually eliminated at the expense of associativity. This tutorial seeks to address this issue.

The document provides two implementations for every topic covered. The first shows how the topic applies when evaluating the text at parse time. The second shows how the topic applies when building a parse tree. It is probably best to ignore the latter (parse tree creation) until the former (parse-time eval) is understood.

Feedback and criticisms are welcome.

Table of Contents

1. What is Operator Associativity?

The Perl binary operators + and - have the same precedence, but that doesn't mean they can be evaluated in any order. For example, consider 4 - 5 + 6.

If executed from left-to-right, 4 - 5 + 6 = (4 - 5) + 6 = 5 If executed from right-to-left, 4 - 5 + 6 = 4 - (5 + 6) = -7

Similarly,

If executed from left-to-right, 4 ** 3 ** 2 = (4 ** 3) ** 2 = 4096 If executed from right-to-left, 4 ** 3 ** 2 = 4 ** (3 ** 2) = 262144

Operators which are evaluated from left-to-right are left-associative.

Operators which are evaluated from right-to-left are right-associative.

In Perl, binary operators + and - are left-associative, and binary operator ** is right-associative. (Refer to Operator Precedence and Associativity in perlop for the associativity of other operators.)

2. Parsers and Associativity

Grammars do not specify associativity. A grammar simply defines whether a given string is valid in the language represented by the grammar, and associativity is not needed for that purpose.

However, we're rarely just interested in validity check. Parsers that return a parse tree representing the text being parsed and those that evaluate the text being parsed are much more useful. Because Parse::RecDescent processes rules from left to right, grammars can be written in a form that lends itself well to doing these tasks.

Left-associative:

sum : sum /[+-]/ NUM | NUM

Right-associative:

pow : NUM '**' pow | NUM

The following subsections will enrich these grammars with code to build a parse tree and to evaluate the expression at parse-time. As you will see, no changes will be needed to the grammar.

2.a. Parse-time Evaluation with Associativity

Left-associative:

sum : sum '+' NUM { $item[1] + $item[3] } | sum '-' NUM { $item[1] - $item[3] } | NUM { $item[1] }

Right-associative:

pow : NUM '**' sum { $item[1] ** $item[3] } | NUM { $item[1] }

2.b. Building a Parse Tree with Associativity

Left-associative:

sum : sum /[+-]/ NUM { [ @item[2,1,3] ] } | NUM { [ $item[1] ] }

Right-associative:

pow : NUM '**' pow { [ @item[2,1,3] ] } | NUM { [ $item[1] ] }

3. Eliminating Left-Recursion

There is a catch. The theory is solid, but parsers have limitations.

Productions of the form a : a b are called left-recursive. An entire class of parser generators cannot process left-recursive grammars, and Parse::RecDescent belongs to that class. Unfortunately, the left-associative rules presented so far are left-recursive. The remainder of this section will show methods of removing left-recursion from grammars for Parse::RecDescent.

3.a. Method 1: Create a Flat List, and Reconstruct

It's easy to parse 4 - 5 + 6 into the list '4', '-', '5', '+', '6'. The following snippet does so:

sum : NUM sum_ { [ $item[1], @{$item[2]} ] } sum_ : /[+-]/ NUM sum_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

If we are evaluating at parse-time, we have little choice but to process the sum as a list rather than a binary operator. When building a parse tree, we have two options. We could leave it as is, or we could convert the list into a tree.

The following subsections show how to evaluate the list and how to treeify it.

3.a.i. ...to Evaluate the Text at Parse-time

{ sub eval_sum { my $acc = shift(@_); while (@_) { my $op = shift(@_); if ($op eq '+') { $acc += shift(@_); } elsif ($op eq '-') { $acc -= shift(@_); } } return $acc; } } sum : NUM sum_ { eval_sum($item[1], @{$item[2]}) } sum_ : /[+-]/ NUM sum_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

3.a.ii. ...to Build a Parse Tree

{ sub treeify { my $t = shift(@_); $t = [ shift(@_), $t, shift(@_) ] while @_; return $t; } } sum : NUM sum_ { treeify($item[1], @{$item[2]}) } sum_ : /[+-]/ NUM sum_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

3.b. Method 2: Create a Flat List Using <leftop>, and Reconstruct

This method is the same as Method 1, but takes advantage of a Parse::RecDescent feature to improve readability. Parse::RecDescent has a pair of directives to help build lists. <leftop> is designed to build left-associative lists, and <rightop> is designed to build right-associative lists.

3.b.i. ...to Evaluate the Text at Parse-time

{ sub eval_sum { my $acc = shift(@_); while (@_) { my $op = shift(@_); if ($op eq '+') { $acc += shift(@_); } elsif ($op eq '-') { $acc -= shift(@_); } } return $acc; } } sum : <leftop: NUM /[+-]/ NUM> { eval_sum(@{$item[1]}) }

3.b.ii. ...to Build a Parse Tree

{ sub treeify { my $t = shift(@_); $t = [ shift(@_), $t, shift(@_) ] while @_; return $t; } } sum : <leftop: NUM /[+-]/ NUM> { treeify(@{$item[1]}) }

3.c. Method 3: Using a Subrule Argument

Normally, information passes from subrule to superrule. For example, in the following code, rule2 receives the result of rule3. In turn, rule1 receives the result of rule2.

rule1: token rule2 rule2: token rule3 rule3: token

The deeper something is, the sooner it will get executed. In a list, that means the last (right-most) element encountered will be executed first. With left-associative lists, the opposite is needed. With left-associative lists, information needs to flow from the superrule to the subrule. Fortunately, Parse::RecDescent provides a means of passing information to subrules: Subrule argument lists.

Think of each rule as a function, and of each reference to that rule as a function call. (In fact, this is how the compiled grammars are implemented.) Just like functions can have arguments, so can subrules.

3.c.i. ...to Evaluate the Text at Parse-time

sum : NUM sum_[ $item[1] ] sum_ : '+' NUM sum_[ $arg[0] + $item[2] ] | '-' NUM sum_[ $arg[0] - $item[2] ] | { $arg[0] }

3.c.ii. ...to Build a Parse Tree

sum : NUM sum_[ $item[1] ] sum_ : '+' NUM sum_[ [ $item[1], $arg[0], $item[2] ] ] | '-' NUM sum_[ [ $item[1], $arg[0], $item[2] ] ] | { $arg[0] }

4. Improving Right-Recursion

Earlier, we ended up with the following rules for right-recursive binary operators:

pow : NUM '**' pow | NUM

Unlike left-recursion, Parse::RecDescent has no problem with right-recursion. However, Parse::RecDescent handles rules with productions with identical prefixes very inefficiently.

Just like in algebra, we can factor out the common prefix into another rule.

pow : NUM pow_ pow_ : '**' pow |

The complicated part is how to evaluate the expression or build the parse tree when one of the operands is matched by one rule, and the other is matched by a different rule. It turns out that doing this is very similar to eliminating left-recursion.

4.a. Method 1: Create a Flat List, and Reconstruct

Just like when eliminating left-recursion, we can build a flat list of the whole chain of powers, and work with that. The difference is that the list will be processed from right to left.

4.a.i. ...to Evaluate the Text at Parse-time

{ sub eval_pow { my $acc = pop(@_); while (@_) { my $op = pop(@_); $acc = pop(@_) ** $acc; } return $acc; } } pow : NUM pow_ { eval_pow($item[1], @{$item[2]}) } pow_ : '**' NUM pow_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

4.a.ii. ...to Build a Parse Tree

{ sub treeify_r { my $t = pop; $t = [ pop, pop, $t ] while @_; return $t; } } pow : NUM pow_ { treeify_r($item[1], @{$item[2]}) } pow_ : '**' NUM pow_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

4.b. Method 2: Create a Flat List Using <rightop>, and Reconstruct

Just like Parse::RecDescent has a directive for creating a flat list for a left-associative operator (<leftop>), it has one to create a flat list for a right-associative operator (<rightop>).

4.b.i. ...to Evaluate the Text at Parse-time

{ sub eval_pow { my $acc = pop(@_); while (@_) { my $op = pop(@_); $acc = pop(@_) ** $acc; } return $acc; } } pow : <rightop: NUM /(\*\*)/ NUM> { eval_pow(@{$item[1]}) }

4.b.ii. ...to Build a Parse Tree

{ sub treeify_r { my $t = pop; $t = [ pop, pop, $t ] while @_; return $t; } } pow : <rightop: NUM /(\*\*)/ NUM> { treeify_r(@{$item[1]}) }

4.c. Method 3: Using a Subrule Argument

Let's look at the algebra again. We can change

pow : NUM '**' pow { $item[1] ** $item[3] } | NUM { $item[1] }

into

pow : NUM pow_ pow_ : '**' pow { <<pow's $item[1]>> ** $item[2] } | { <<pow's $item[1]>> }

The problem is that we have to pass $item[1] from pow to pow_. We've already seen that we can pass data from one rule to another using subrule arguments. When eliminating left-recursion, we used the subrule argument to form a stack. When improving right-recursion, we simply pass from the main rule to the helper rule.

4.c.i. ...to Evaluate the Text at Parse-time

pow : NUM pow_[ $item[1] ] pow_ : '**' pow { $arg[0] ** $item[2] } | { $arg[0] }

4.c.ii. ...to Build a Parse Tree

pow : NUM pow_[ $item[1] ] pow_ : '**' pow { [ $item[1], $arg[0], $item[2] ] } | { $arg[0] }

5. Working Code

The following subsections contain complete, working code to parse expressions formed of the +, - and ** binary operators using the Subrule Argument methods. Parentheses are also supported to produce more meaningful results.

In order to support parentheses and to give the operators their proper precedence, the rules used in the upcoming code are slightly different from those seen earlier. Where NUM used to be in the productions, you will now find term (in sum/sum_) and sum (in pow/pow_).

The code of both subsections produce the same output, an uncommented version of the following:

Demonstrates left-associativity 4-5+6 = 5 got 5 (4-5)+6 = 5 got 5 4-(5+6) = -7 got -7 Demonstrates right-associativity 4**3**2 = 262144 got 262144 (4**3)**2 = 4096 got 4096 4**(3**2) = 262144 got 262144

5.a. ...to Evaluate the Text at Parse-time

use strict; use warnings; use Parse::RecDescent (); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } parse : expr /^\Z/ { $item[1] } # Just an alias expr : pow # vvv lowest precedence # pow : sum '**' pow # | sum pow : sum pow_[ $item[1] ] pow_ : '**' pow { $arg[0] ** $item[2] } | { $arg[0] } # sum : sum /[+-]/ term # | term sum : term sum_[ $item[1] ] sum_ : '+' term sum_[ $arg[0] + $item[2] ] | '-' term sum_[ $arg[0] - $item[2] ] | { $arg[0] } # ^^^ highest precedence term : '(' expr ')' { $item[2] } | /\d+/ __END_OF_GRAMMAR__ my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); foreach my $expr ( '4-5+6', # Demonstrates left-associativity '(4-5)+6', '4-(5+6)', '4**3**2', # Demonstrates right-associativity '(4**3)**2', '4**(3**2)', ) { my $expected = eval $expr; my $got = $parser->parse($expr); print("$expr = $expected got $got\n"); }

5.b. ...to Build and Evaluate a Parse Tree

use strict; use warnings; use Parse::RecDescent (); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } parse : expr /^\Z/ { $item[1] } # Just an alias expr : pow # vvv lowest precedence # pow : sum '**' pow # | sum pow : sum pow_[ $item[1] ] pow_ : '**' pow { [ $item[1], $arg[0], $item[2] ] } | { $arg[0] } # sum : sum /[+-]/ term # | term sum : term sum_[ $item[1] ] sum_ : /[+-]/ term sum_[ [ $item[1], $arg[0], $item[2] ] ] | { $arg[0] } # ^^^ highest precedence term : '(' expr ')' { $item[2] } | /\d+/ { [ @item ] } __END_OF_GRAMMAR__ my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); my %eval = ( term => sub { $_[1] }, '+' => sub { eval_node($_[1]) + eval_node($_[2]) }, '-' => sub { eval_node($_[1]) - eval_node($_[2]) }, '**' => sub { eval_node($_[1]) ** eval_node($_[2]) }, ); sub eval_node { my ($node) = @_; $eval{$node->[0]}->(@$node); } foreach my $expr ( '4-5+6', # Demonstrates left-associativity '(4-5)+6', '4-(5+6)', '4**3**2', # Demonstrates right-associativity '(4**3)**2', '4**(3**2)', ) { my $expected = eval $expr; my $tree = $parser->parse($expr); my $got = eval_node($tree); print("$expr = $expected got $got\n"); }

Update Aug 13, 2006: The examples have been simplified. A right-associative operator is used for the right-associative examples. Parse-time eval was placed before parse tree building. Added section on simplifying right-recursion. Small additions were made here and there to improve clarity. It still needs to link to a tutorial on precedence.

Update Jun 13, 2014: Fixed spelling and grammar mistakes identified by hexcoder.

Update Oct 3, 2016: Fixed indexing problem raised by an anonymous monk.