Hi all,

Below is my RG grammar to parse simple Perl-like expressions. Sorry it's a bit long. Seems to be working (passing all my tests). Eventually I plan to emit Perl, JS, and PHP code from the parsed expression.

The only problem I'm having right now is: list of Answer's is exponentially slow. E.g. [1] and [1, 2] takes under a second on my PC, but [1, 2, 3] takes more than a second, [1, 2, 3, 4] takes more than 10 seconds, and so on. The same goes to foo(1, 2, 3, ...) and {a=>1, b=>2, c=>3, ...}

Turning on debugging, e.g. with: <debug:step> shows that there are *lots* of backtracking done by the RE engine. My guess for the cure is to prevent backtracking when possible, e.g. by adding possessive quantifiers. But right now I don't know how/where exactly to put them.

Any pointers appreciated. Thanks.

#!perl
use v5.10;
use warnings;
use Data::Dump 'pp';
use List::Util qw(reduce);
use POSIX;

my $vars = {
    a => 1,
    b => 2,
};

my $grammar = do{
    use Regexp::Grammars;
    qr{
        ^<Answer>$

        <rule: Answer>
            <MATCH=Or>

# precedence level  2: left     =>
        <rule: Pair>
            <K=Key> =\> <V=Answer>
            (?{ $MATCH = [$MATCH{K}, $MATCH{V}] })

# precedence level  3: left     || //
        <rule: Or>
            <[X=And]> ** <[Op=(\|\||//)]>
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $term (@{$MATCH{X}}) {
                    my $op = shift @{$MATCH{Op}//=[]};
                    if    ($op eq '||') { $MATCH ||= $term }
                    elsif ($op eq '//') { $MATCH //= $term }
                }
            })

# precedence level  4: left     &&
        <rule: And>
            <[X=BitOrXor]> ** <[Op=(&&)]>
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $term (@{$MATCH{X}}) {
                    my $op = shift @{$MATCH{Op}//=[]};
                    if    ($op eq '&&') { $MATCH &&= $term }
                }
            })

# precedence level  5: left     | ^
        <rule: BitOrXor>
            <[X=BitAnd]> ** <[Op=(\||\^)]>
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $term (@{$MATCH{X}}) {
                    my $op = shift @{$MATCH{Op}//=[]};
                    if    ($op eq '|') { $MATCH = $MATCH+0 | $term }
                    elsif ($op eq '^') { $MATCH = $MATCH+0 ^ $term }
                }
            })


# precedence level  6: left     &
        <rule: BitAnd>
            <[X=Equal]> ** <[Op=(&)]>
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $term (@{$MATCH{X}}) {
                    my $op = shift @{$MATCH{Op}//=[]};
                    if    ($op eq '&') { $MATCH = $MATCH+0 & $term }
                }
            })

# precedence level  7: nonassoc == != <=> eq ne cmp
        <rule: Equal>
            # \x3c = "<", \x3e = ">"
            <X=Term> <Op=(==|!=|\x3c=\x3e|eq|ne|cmp)> <Y=Term>
            (?{ my ($x, $op, $y) = ($MATCH{X}, $MATCH{Op}, $MATCH{Y});
                if    ($op eq '==' ) { $MATCH = ($x ==  $y) }
                elsif ($op eq '!=' ) { $MATCH = ($x !=  $y) }
                elsif ($op eq '<=>') { $MATCH = ($x <=> $y) }
                elsif ($op eq 'eq' ) { $MATCH = ($x eq  $y) }
                elsif ($op eq 'ne' ) { $MATCH = ($x ne  $y) }
                elsif ($op eq 'cmp') { $MATCH = ($x cmp $y) }
            })
          | <MATCH=LessGreater>

# precedence level  8: nonassoc < > <= >= lt gt le ge
        <rule: LessGreater>
            # \x3c = "<", \x3e = ">"
            <X=Term> <Op=(\x3c=?|\x3e=?|lt|gt|le|ge)> <Y=Term>
            (?{ my ($x, $op, $y) = ($MATCH{X}, $MATCH{Op}, $MATCH{Y});
                if    ($op eq '<' ) { $MATCH = ($x <  $y) }
                elsif ($op eq '<=') { $MATCH = ($x <= $y) }
                elsif ($op eq '>' ) { $MATCH = ($x >  $y) }
                elsif ($op eq '>=') { $MATCH = ($x >= $y) }
                elsif ($op eq 'lt') { $MATCH = ($x lt $y) }
                elsif ($op eq 'gt') { $MATCH = ($x gt $y) }
                elsif ($op eq 'le') { $MATCH = ($x le $y) }
                elsif ($op eq 'ge') { $MATCH = ($x ge $y) }
            })
          | <MATCH=BitShift>

# precedence level  9: left     << >>
        <rule: BitShift>
            # \x3c = "<", \x3e = ">"
            <[X=Add]> ** <[Op=(\x3c\x3c|\x3e\x3e)]>
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $term (@{$MATCH{X}}) {
                    my $op = shift @{$MATCH{Op}//=[]};
                    if    ($op eq '>>') { $MATCH >>= $term }
                    elsif ($op eq '<<') { $MATCH <<= $term }
                }
            })

# precedence level 10: left     + - .
        <rule: Add>
            <[X=Mult]> ** <[Op=(\+|-|\.)]>
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $term (@{$MATCH{X}}) {
                    my $op = shift @{$MATCH{Op}//=[]};
                    if    ($op eq '+') { $MATCH += $term }
                    elsif ($op eq '-') { $MATCH -= $term }
                    elsif ($op eq '.') { $MATCH .= $term }
                }
            })

# precedence level 11: left     * / % x
        <rule: Mult>
            <[X=Unary]> ** <[Op=(\*|/|%|x)]>
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $term (@{$MATCH{X}}) {
                    my $op = shift @{$MATCH{Op}//=[]};
                    if    ($op eq '*') { $MATCH *= $term }
                    elsif ($op eq '/') { $MATCH /= $term }
                    elsif ($op eq '%') { $MATCH %= $term }
                    elsif ($op eq 'x') { $MATCH x= $term }
                }
            })

# precedence level 12: right    ! ~ unary+ unary-
        <rule: Unary>
            <[Op=(!|~|\+|-)]>* <X=Pow>
            (?{ $MATCH = $MATCH{X};
                if ($MATCH{Op}) {
                    for my $op (reverse @{$MATCH{Op}}) {
                        if    ($op eq '!') { $MATCH = !$MATCH }
                        elsif ($op eq '-') { $MATCH = -$MATCH }
                        elsif ($op eq '~') { $MATCH = ~($MATCH+0) }
                    }
                }
            })

# precedence level 13: right    **
        <rule: Pow>
            <[X=Subscripting]> ** <Op=(\*\*)>
            (?{ $MATCH = reduce { $b ** $a } reverse @{$MATCH{X}} })

# precedence level 14: left    hash[s], array[i]
        <rule: Subscripting>
            <[X=Term]> <[Subscript]>*
            (?{ $MATCH = shift @{$MATCH{X}};
                for my $i (@{$MATCH{Subscript}}) {
                    if (ref($MATCH) eq 'ARRAY'  ) { $MATCH = $MATCH->[$i] }
                    elsif (ref($MATCH) eq 'HASH') { $MATCH = $MATCH->{$i} }
                    else                          { $MATCH = "error: invalid hash subscripting"; last }
                }
            })

        <rule: Subscript>
              \[ <MATCH=Term> \]

# precedence level 15: left     term (variable, str/num literals, func(), (paren))
        <rule: Term>
              <MATCH=Func>
            | <MATCH=Var>
            | <MATCH=Str>
            | <MATCH=Undef>
            | <MATCH=Num>
#            | <debug:step> <MATCH=Array> <debug:off>
            | <MATCH=Array>
            | <MATCH=Hash>
            | \( <MATCH=Answer> \)

        <rule: Array>
            \[ \]
            (?{ $MATCH = [] })
          | \[ <[X=Answer]> ** (,) \]
            (?{ $MATCH = $MATCH{X} })

        <rule: Hash>
            \{ \}
            (?{ $MATCH = {} })
          | \{ <[X=Pair]> ** (,) \}
            (?{ $MATCH = { map { $_->[0] => $_->[1] } @{ $MATCH{X} } } })

        <rule: Key>
            <MATCH=(\w+)>
          | <MATCH=Answer>

        <token: Undef>
            undef
            (?{ $MATCH = undef })

        <token: Num>
            <MATCH=( [+-]? \d++ (?: \. \d++ )?+ )>

        <token: Str>
            # XXX support escapes
            <X=( "[^"]*" | '[^']*' )>
            (?{ $MATCH = substr($MATCH{X}, 1, length($MATCH{X})-2); })

        <rule: Var>
            \$ <X=(\w+)>
            (?{ $MATCH = $vars->{ $MATCH{X} } })

        <rule: Func>
            <FuncName=([A-Za-z_]\w*)> \( <[Args=Answer]> ** (,) \)
            (?{
                my $f = $MATCH{FuncName};
                my $args = $MATCH{Args};
                if    ($f eq 'length') { $MATCH = length($args->[0]) }
                elsif ($f eq 'ceil'  ) { $MATCH = POSIX::ceil($args->[0]) }
                elsif ($f eq 'floor' ) { $MATCH = POSIX::floor($args->[0]) }
                elsif ($f eq 'rand'  ) { $MATCH = rand() }
                else                   { $MATCH = "undef function $f" }
            })

    }xms
};

print "> ";
while (my $input = <>) {
    if ($input =~ $grammar) {
        say '--> ', pp($/{Answer});
    } else {
        say 'PARSE FAILURE';
    }
    print "> ";
}

In reply to Optimizing Regexp::Grammars' grammar (backtracking?) by sharyanto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.