in reply to Order of Precedence in Parse::RecDescent grammar

See Re: Left-associative binary operators in Parse::RecDescent for my preferred technique of doing precedence in grammars (stratification). You have one layer (stratum) for each precedence levels. Your grammar's starting rule is at the lowest precedence and the subrules increase in precedence. Finally, at the bottom level (of highest precedence), you have the rule for parenthesized expressions which recurses back to the top level (of lowest precedence). But this is unambiguous, because of the parentheses in this recursion rule.

P::RD may have other bells & whistles to help with precedence, but this is the only way I can ever remember to do precedence correctly, and it works with any kind of parser (although you may have to factor out left-recursion in right-associative operators).

Update: BTW, you can string together all operations of the same precedence level, which is probably more efficient. I.e, parse (a+b)*c*d*e*(f+g) into a shallow parse tree whose root has 5 children, instead of having a deeper parse tree whose nodes always have 2 children. However, for simplicity/uniformity it's often nice to always know that each node of the parse tree is a simple binary operation.

blokhead

  • Comment on Re: Order of Precedence in Parse::RecDescent grammar

Replies are listed 'Best First'.
Re^2: Order of Precedence in Parse::RecDescent grammar
by suaveant (Parson) on Jun 01, 2005 at 17:12 UTC
    So... away I went... and that seems to work, BUT!

    where when I parsed 4+1*2 before I got

    $VAR1 = [ '4', '+', [ '1', '*', '2' ] ];
    Now I get
    $VAR1 = [ [ [ [ '4' ] ], '+', [ [ '1' ], '*', [ '2' ] ] ] ];
    Ick... any good way to clean that up? It gets really nasty with more complex statements.
    Here is the new grammar.
    startrule: comparison_expr comparison_expr: additive_expr (comparison_op additive_expr { [ @item[1..2] ] })(s?) { [ $item[1], map { @ $_ } @{$item[2]} ] } comparison_op: />=?|<=?|!=|==?|le|ge|eq|ne|lt|gt/ additive_expr: multiplicative_expr (additive_op multiplicative_expr { [ @item[1..2] ] })(s +?) { [ $item[1], map { @ $_ } @{$item[2]} ] } additive_op: /[+-]/ multiplicative_expr: modulus_expr (multiplicative_op modulus_expr { [ @item[1..2] ] })(s? +) { [ $item[1], map { @ $_ } @{$item[2]} ] } multiplicative_op: /[*\/]/ modulus_expr: paren_expr ('%' paren_expr { [ @item[1..2] ] })(s?) { [ $item[1], map { @ $_ } @{$item[2]} ] } paren_expr: '(' comparison_expr ')' { $item[2] } | number number: /[+-]?\d+/ { $item[1] }

                    - Ant
                    - Some of my best work - (1 2 3)

      That's another reason why it's simpler to have just binary operations at each level (instead of a longer sequence of same-precedence operations). ;)

      Consider the generic outline:

      lower_prec_expr : higher_prec_expr lower_prec_op lower_prec_expr | higher_prec expr
      The problem is that you want your parse to give different semantic values for the different branches. If you take the 2nd branch, it's just a "pass-through" of the semantic value and you don't want to construct a new arrayref around it. If you take the first branch where there is actually an operation, you need to construct an arrayref with the appropriate things.

      You've combined both of these branches into one production in the grammar like this:

      lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s?) { [ "put stuff in arrayref" ] }
      So now you're putting things inside an arrayref even in the "pass-through" case, where there is no operator at this point in the parse.

      To fix this, put a conditional in your semantic rule to distinguish these cases, i.e:

      lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s?) { @{$item[2]} ? [ ... ] : $item[1] }
      That's kinda nasty. Alternatively, have two different productions explicitly in the grammar:
      lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s) { [ ... ] } | higher_prec_expr
      I.e, if there *are* operators here (no question mark on the (s) quantifier), we build an arrayref for the semantic value with the first branch. Otherwise, the second branch has no semantic rule, so it's just a "pass-through" of the semantic value and does not add layers of arrayref.

      The latter is probably the best choice. Just change the (s?)'s to (s)'s and add the other alternation to the grammar.

      blokhead