in reply to Optimizing Regexp::Grammars' grammar (backtracking?)
There's no reason for backtracking to even be possible for your grammar (except directly up to a parse failure), so it's not quantifier problem.
For example, let's look at
Add : Mult /[+-]/ Add | Mult
To avoid backtracking, factor out the common prefix:
Add : Mult ( /[+-]/ Add )*
You've written that as
<rule: Add> <[X=Mult]> ** <[Op=(\+|-)]>
That's fine. You got that one right. But starting with Equal, you introduced backtracking.
Simplified:<rule: Equal> <X=Term> <Op=(==|!=|\x3c=\x3e|eq|ne|cmp)> <Y=Term> | <MATCH=LessGreater>
Equal : Term OP Term | LessGreater
Problem: LessGreater can start with Term too!
I'm going to assume that preventing $x+1 > $y+2 was accidental, in which case you meant
Equal : LessGreater OP LessGreater | LessGreater
Factoring out the common prefix:
Equal : LessGreater (?: OP LessGreater )?
And that's written as
<rule: Equal> <X=LessGreater> (?: <Op=(==|!=|\x3c=\x3e|eq|ne|cmp)> <Y=LessGreate +r> )?
Or maybe not. That can causes Perl to crash or return nothing even though it matches fine. I tried various things including the following with no luck:
<rule: Equal> <X=LessGreater> (?: <Op=(==|!=|\x3c=\x3e|eq|ne|cmp)> <Y=LessGreater> (?{ ... } | (?{ ... } )
I'm using 5.10.0. Maybe it's been fixed in 5.10.1 or 5.12.0. Maybe you can find some way around the problem. Good luck!
#!perl use v5.10; use warnings; use Data::Dump 'pp'; use List::Util qw(reduce); use POSIX; my $vars = { a => 1, b => 2, }; my $grammar = do{ use Regexp::Grammars; qr{ ^ <debug:step> <Expr> $ <rule: Expr> <MATCH=Or> <rule: Pair> <K=Key> =\> <V=Expr> (?{ $MATCH = [$MATCH{K}, $MATCH{V}] }) # precedence level 12: left || // <rule: Or> <[X=And]> ** <[Op=(\|\||//)]> (?{ $MATCH = shift @{$MATCH{X}}; for my $term (@{$MATCH{X}}) { my $op = shift @{$MATCH{Op}//=[]}; if ($op eq '||') { $MATCH ||= $term } elsif ($op eq '//') { $MATCH //= $term } } }) # precedence level 11: left && <rule: And> <[X=BitOrXor]> ** <[Op=(&&)]> (?{ $MATCH = shift @{$MATCH{X}}; for my $term (@{$MATCH{X}}) { my $op = shift @{$MATCH{Op}//=[]}; if ($op eq '&&') { $MATCH &&= $term } } }) # precedence level 10: left | ^ <rule: BitOrXor> <[X=BitAnd]> ** <[Op=(\||\^)]> (?{ $MATCH = shift @{$MATCH{X}}; for my $term (@{$MATCH{X}}) { my $op = shift @{$MATCH{Op}//=[]}; if ($op eq '|') { $MATCH = $MATCH+0 | $term } elsif ($op eq '^') { $MATCH = $MATCH+0 ^ $term } } }) # precedence level 9: left & <rule: BitAnd> <[X=Equal]> ** <[Op=(&)]> (?{ $MATCH = shift @{$MATCH{X}}; for my $term (@{$MATCH{X}}) { my $op = shift @{$MATCH{Op}//=[]}; if ($op eq '&') { $MATCH = $MATCH+0 & $term } } }) # precedence level 8: nonassoc == != <=> eq ne cmp < > <= >= lt gt le + ge <rule: Equal> # \x3c = "<", \x3e = ">" <X=BitShift> (?: <Op=(==|!=|\x3c=\x3e|eq|ne|cmp|\x3c=?|\x3 +e=?|lt|gt|le|ge)> <Y=BitShift> )? (?{ my ($x, $op, $y) = ($MATCH{X}, $MATCH{Op}, $MATCH{Y}); if (!defined($op)) { $MATCH = $x } elsif ($op eq '==' ) { $MATCH = ($x == $y ?1:0) } elsif ($op eq '!=' ) { $MATCH = ($x != $y ?1:0) } elsif ($op eq '<=>' ) { $MATCH = ($x <=> $y ?1:0) } elsif ($op eq 'eq' ) { $MATCH = ($x eq $y ?1:0) } elsif ($op eq 'ne' ) { $MATCH = ($x ne $y ?1:0) } elsif ($op eq 'cmp' ) { $MATCH = ($x cmp $y ?1:0) } elsif ($op eq '<' ) { $MATCH = ($x < $y ?1:0) } elsif ($op eq '<=' ) { $MATCH = ($x <= $y ?1:0) } elsif ($op eq '>' ) { $MATCH = ($x > $y ?1:0) } elsif ($op eq '>=' ) { $MATCH = ($x >= $y ?1:0) } elsif ($op eq 'lt' ) { $MATCH = ($x lt $y ?1:0) } elsif ($op eq 'gt' ) { $MATCH = ($x gt $y ?1:0) } elsif ($op eq 'le' ) { $MATCH = ($x le $y ?1:0) } elsif ($op eq 'ge' ) { $MATCH = ($x ge $y ?1:0) } }) # precedence level 7: left << >> <rule: BitShift> # \x3c = "<", \x3e = ">" <[X=Add]> ** <[Op=(\x3c\x3c|\x3e\x3e)]> (?{ $MATCH = shift @{$MATCH{X}}; for my $term (@{$MATCH{X}}) { my $op = shift @{$MATCH{Op}//=[]}; if ($op eq '>>') { $MATCH >>= $term } elsif ($op eq '<<') { $MATCH <<= $term } } }) # precedence level 6: left + - . <rule: Add> <[X=Mult]> ** <[Op=(\+|-|\.)]> (?{ $MATCH = shift @{$MATCH{X}}; for my $term (@{$MATCH{X}}) { my $op = shift @{$MATCH{Op}//=[]}; if ($op eq '+') { $MATCH += $term } elsif ($op eq '-') { $MATCH -= $term } elsif ($op eq '.') { $MATCH .= $term } } }) # precedence level 5: left * / % x <rule: Mult> <[X=Unary]> ** <[Op=(\*|/|%|x)]> (?{ $MATCH = shift @{$MATCH{X}}; for my $term (@{$MATCH{X}}) { my $op = shift @{$MATCH{Op}//=[]}; if ($op eq '*') { $MATCH *= $term } elsif ($op eq '/') { $MATCH /= $term } elsif ($op eq '%') { $MATCH %= $term } elsif ($op eq 'x') { $MATCH x= $term } } }) # precedence level 4: right ! ~ unary+ unary- <rule: Unary> <[Op=(!|~|\+|-)]>* <X=Pow> (?{ $MATCH = $MATCH{X}; if ($MATCH{Op}) { for my $op (reverse @{$MATCH{Op}}) { if ($op eq '!') { $MATCH = !$MATCH } elsif ($op eq '-') { $MATCH = -$MATCH } elsif ($op eq '~') { $MATCH = ~($MATCH+0) } } } }) # precedence level 3: right ** <rule: Pow> <[X=Subscripting]> ** <Op=(\*\*)> (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{X}} }) # precedence level 2: left hash[s], array[i] <rule: Subscripting> <[X=Term]> <[Subscript]>* (?{ $MATCH = shift @{$MATCH{X}}; for my $i (@{$MATCH{Subscript}}) { if (ref($MATCH) eq 'ARRAY' ) { $MATCH = $MATCH->[ +$i] } elsif (ref($MATCH) eq 'HASH') { $MATCH = $MATCH->{ +$i} } else { $MATCH = "error: i +nvalid hash subscripting"; last } } }) <rule: Subscript> \[ <MATCH=Term> \] # precedence level 1: left term (variable, str/num literals, func( +), (paren)) <rule: Term> <MATCH=Func> | <MATCH=Var> | <MATCH=Str> | <MATCH=Undef> | <MATCH=Num> | <MATCH=Array> | <MATCH=Hash> | \( <MATCH=Expr> \) <rule: Array> \[ \] (?{ $MATCH = [] }) | \[ <[X=Expr]> ** (,) \] (?{ $MATCH = $MATCH{X} }) <rule: Hash> \{ \} (?{ $MATCH = {} }) | \{ <[X=Pair]> ** (,) \} (?{ $MATCH = { map { $_->[0] => $_->[1] } @{ $MATCH{X} } } + }) <rule: Key> <MATCH=(\w+)> | <MATCH=Expr> <token: Undef> undef (?{ $MATCH = undef }) <token: Num> <MATCH=( [+-]? \d++ (?: \. \d++ )?+ )> <token: Str> # XXX support escapes <X=( "[^"]*" | '[^']*' )> (?{ $MATCH = substr($MATCH{X}, 1, length($MATCH{X})-2); }) <rule: Var> \$ <X=(\w+)> (?{ $MATCH = $vars->{ $MATCH{X} } }) <rule: Func> <FuncName=([A-Za-z_]\w*)> \( <[Args=Expr]> ** (,) \) (?{ my $f = $MATCH{FuncName}; my $args = $MATCH{Args}; if ($f eq 'length') { $MATCH = length($args->[0]) } elsif ($f eq 'ceil' ) { $MATCH = POSIX::ceil($args->[ +0]) } elsif ($f eq 'floor' ) { $MATCH = POSIX::floor($args-> +[0]) } elsif ($f eq 'rand' ) { $MATCH = rand() } else { $MATCH = "undef function $f" +} }) }xms }; for (;;) { print "> "; defined( my $input = <> ) or last; if ($input =~ $grammar) { say '--> ', pp($/{Expr}); } else { say 'PARSE FAILURE'; } }
Notice I combined all the relational operators. Because you don't allow them to be chained, they actually all had the same precedence.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Optimizing Regexp::Grammars' grammar (backtracking?)
by dgaramond2 (Monk) on Apr 15, 2010 at 03:04 UTC | |
by Anonymous Monk on Apr 15, 2010 at 05:08 UTC | |
by dgaramond2 (Monk) on Apr 15, 2010 at 14:42 UTC |