DevM has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am looking for some help with Parse::RecDescent. I wrote the following code to parse a string containing several parameters for an SQL style search. The goal is to create a tree I can traverse from the bottom up so that I can execute the code in order. I started with a simple example but if you look at the commented out input you will see where I hope to end up.
# my $input = "(cvtype='problem') and (problem_description match '*' +) and ((problem_synopsis match 'FCSIM') or (problem_synopsis match 'A +TTE')) and (create_time>time('06/01/2014 0:00:00')) and (create_time< +time('09/30/2014 0:00:00'))"; my $input = "(cvtype='problem') and (problem_description match '*') +"; print "Text: $input\n"; my $grammar = q{ startrule: expr expr: operand operation(s?) { $return = @{$item[2]} ? { $item[2], $item[1]} : $item[1]} operation: /and | or/ operand { $return = { $item[1], $item[2] }} operand: '(' expr ')' { $return = $item[2] } | term { $item[1]} term: /[\w\s=><\/:"'\*_]+/ }; my $parser = Parse::RecDescent->new($grammar); my $result = $parser->startrule($input) or die "Could Not Parse! +\n"; print Dumper $result;
The results look like this:
$VAR1 = { 'ARRAY(0xa102e8)' => 'cvtype=\'problem\'' }; I expected the results to look something like this: $VAR1 = { { 'cvtype=\'problem\'' } { and } { problem_description match \'*\' } };

I know I am missing something fundamental. But I can’t seem to find it. I don't have any Idea where that array ref is coming from. Any help you can provide would be helpful.
P.S. I also have the Debug output but it's huge. Please let me know if you think I should post it.
Thank You,
DevM

Replies are listed 'Best First'.
Re: Can I get some rules help with PARSE::RECDESCENT
by ikegami (Patriarch) on Mar 09, 2018 at 18:09 UTC

    Your desired output makes no sense.

    syntax error at a.pl line 5, near "{" Can't find string terminator "'" anywhere before EOF at a.pl line 9.

    The (first) fundamental bit you are missing is a definition of what you want to get from the parser. If you don't even know what you want, how can you hope to write a parser that produces it!

      ikegami, thank you for your response. I will try to answer your questions.

      I am sorry my code did not work for you. I diffed it against the code on my system and it is working for me. I can offer no explanation for this behavior.

      I apologize for not being more forthcoming with my requirements. I had hoped that the example I provided would be enough. This is essentially an “order of operations” problem. The three operations I have are And, Or, and Not. And I basically want to build a tree that will let me handle the terms in order based on the precedence provided by parenthesis with the And and Or at the same level and Not being a little higher. (I know Not is not in my original code, but I was starting small) Take the below generic statement.

      (term1) and (term2) and ((term3) or (term4)) and not (term5))

      I would like to boil this down into an array I can traverse bottom up getting to the innermost terms first.

      So first resolve term3 or term 4

      Then resolve the results vs. not term5

      Then resolve the results vs term 2 and term 1

      I figured traversing a tree structure would be the fastest way to do this but I am open to suggestions.

        You said this is your desired output:

        $VAR1 = { { 'cvtype=\'problem\'' } { and } { problem_description match \'*\' } };

        Your desired output makes no sense. It doesn't even compile.

        syntax error at a.pl line 5, near "{" Can't find string terminator "'" anywhere before EOF at a.pl line 9.

        Please fix.


        I would like to boil this down into an array

        Really? An array of what? Parsers normally produce trees.

Re: Can I get some rules help with PARSE::RECDESCENT
by 7stud (Deacon) on Mar 10, 2018 at 10:32 UTC

    I know I am missing something fundamental... I don't have any Idea where that array ref is coming from

    Have you taken a look at Some Tips(from a beginner) for using Parse::RecDescent:

    I suggest Dumping @item as the first line in an action and not writing any additional code in the action until you examine the output.

    Eventually, your expr rule gets an @item array that looks like this:

    expr rule: $VAR1 = [ 'expr', 'cvtype=\'problem\'', [ { 'and ' => 'problem_description match \'*\'' } ] ];

    Then you return the hash ref  {$item[2], $item[1]}, where the key is $item[2], which is an array ref (it's surrounded by brackets in the Data::Dumper output). That's why you see:

    $VAR1 = { 'ARRAY(0xa102e8)' => 'cvtype=\'problem\'' };

    For instance:

    my $href = { [1, 2, 3], "hello" }; say Dumper($href); --output:-- $VAR1 = { 'ARRAY(0x7fe1bf803638)' => 'hello' };
    Here's the proof:

    use strict; use warnings; use 5.020; use autodie; use Data::Dumper; use Parse::RecDescent; my $grammar =<<'END_OF_GRAMMAR'; #Start up action(executed in parser namespace): { use 5.012; #enable say() use Data::Dumper; } startrule: expr { say "startrule:"; say Dumper(\@item); $return = \@item; } expr: operand operation(s?) { say "expr rule:"; say Dumper(\@item); my $result = @{$item[2]} ? { $item[2], $item[1]} : $item[1 +]; say 'Inside expr rule: $result='; say Dumper($result); $return = $result; } operation: /and | or/ operand { say "operation rule:"; say Dumper(\@item); $return = { $item[1], $item[2] }; } operand: '(' expr ')' { say "operand rule:"; say Dumper(\@item); $return = $item[2]; } | term {say "term rule:";say Dumper(\@item);$item[1] } term: /[\w\s=><\/:"'\*_]+/ END_OF_GRAMMAR my $input = "(cvtype='problem') and (problem_description match '*')"; print "Text: $input\n"; my $parser = Parse::RecDescent->new($grammar); my $result = $parser->startrule($input) or die "Could Not Parse!\n"; say '$result:'; print Dumper $result; --output:-- Text: (cvtype='problem') and (problem_description match '*') term rule: $VAR1 = [ 'operand', 'cvtype=\'problem\'' ]; expr rule: $VAR1 = [ 'expr', 'cvtype=\'problem\'', [] ]; Inside expr rule: $result= $VAR1 = 'cvtype=\'problem\''; operand rule: $VAR1 = [ 'operand', '(', 'cvtype=\'problem\'', ')' ]; term rule: $VAR1 = [ 'operand', 'problem_description match \'*\'' ]; expr rule: $VAR1 = [ 'expr', 'problem_description match \'*\'', [] ]; Inside expr rule: $result= $VAR1 = 'problem_description match \'*\''; operand rule: $VAR1 = [ 'operand', '(', 'problem_description match \'*\'', ')' ]; operation rule: $VAR1 = [ 'operation', 'and ', 'problem_description match \'*\'' ]; expr rule: $VAR1 = [ 'expr', 'cvtype=\'problem\'', [ { 'and ' => 'problem_description match \'*\'' } ] ]; Inside expr rule: $result= $VAR1 = { 'ARRAY(0x7fb68384ac98)' => 'cvtype=\'problem\'' }; startrule: $VAR1 = [ 'startrule', { 'ARRAY(0x7fb68384ac98)' => 'cvtype=\'problem\'' } ]; $result: $VAR1 = [ 'startrule', { 'ARRAY(0x7fb68384ac98)' => 'cvtype=\'problem\'' }, $VAR1 ];
Re: Can I get some rules help with PARSE::RECDESCENT
by beartham (Novice) on Mar 10, 2018 at 19:55 UTC

    It has been about 18 years since I used RecDescent, but looking at your grammar, it seems that the array reference is coming from the '@' in the Action of your expr rule coupled with the brackets in your term rule, which seems to be misplaced with the leading '|', so its regex may not be recognized as a regex, confusing RecDescent. What is the '|' supposed to be an alternative of, the operand rule parenthesized expression?

    I liked RecDescent for its EBNF, because I had much previous experience with another recursive descent metacompiler, TREE META, but I found it far too slow for my design work. I ended up using Bison. I wonder if Damien has improved its speed by re-coding some of it in C since I last used it. I wish I had more time to work on it.

      What is the '|' supposed to be an alternative of, the operand rule parenthesized expression?

      A rule can have alternatives, e.g. rule_name: alt1 | alt2, and you can insert an action in the middle of a rule:

      rule_name: alt1 {action} | alt2

      And, you can add whatever whitespace you want:

      rule_name: alt1 {action} | alt2

      And you can add an action at the end of a rule:

      rule_name: alt1 {action} | alt2 {action}
Re: Can I get some rules help with PARSE::RECDESCENT
by Anonymous Monk on Mar 12, 2018 at 13:58 UTC
    Incidentally, I found that this is a great place to use __DATA__ ... put the grammar after that line and slurp it into the variable.
Re: Can I get some rules help with PARSE::RECDESCENT
by Anonymous Monk on Mar 12, 2018 at 14:07 UTC
    Normally a rule such as expr would be written in right-recursive style: expr: expression expr(?) So, the output always contains two productions but the second may be a recursive instance. Alsobear in mind this tool generates recursive descent compilers which do not know about niceties like operator precedence. So the definition of expr normally has to consist of other rules (such as term and factor) expressly to handle precedence. The author points out some differences between this and tools like yacc but there are many others.

      Ok, thanks to 7studs example I have had some success with this version of the parser.

      ################################################################### use strict; use warnings; $::RD_HINT=1; use Data::Dumper; #use autodie; use Parse::RecDescent; my $grammar =<<'END_OF_GRAMMAR'; #Start up action(executed in parser namespace): { use Data::Dumper; } startrule: expr { print "startrule:"; print Dumper(\@item); $return = \@item; } expr: operand(s) { print "expr rule:"; print Dumper(\@item); my $result = $item[1]; print 'Inside expr rule: $result='; print Dumper($result); $return = $result; } operand: /and|or/i { print "operand and/or:"; print Dumper(\@item); $return = lc($item[1]); } operand: /not/i { print "operand not:"; print Dumper(\@item); $return = [lc($item[1])]; } operand: '(' expr ')' { print "operand rule:"; print Dumper(\@item); $return = $item[2]; } | term { $item[1] } term: /[\w\s=><\/:"'\*_]+/ END_OF_GRAMMAR my $input = "(cvtype='problem') and ((problem_description match '*') a +nd (((problem_synopsis match 'FCSIM') or (problem_synopsis match 'ATT +E')) OR not ((create_time>time('06/01/2014 0:00:00')) and (create_tim +e<time('09/30/2014 0:00:00')))))"; #my $input = "(cvtype='problem') OR not (problem_description match '*' +)"; print "Text: $input\n"; my $parser = Parse::RecDescent->new($grammar); my $result = $parser->startrule($input) or die "Could Not Parse!\n"; print "\n\nresult:"; print Dumper $result;

      The results look almost exactly like what I was hoping for:

      result:$VAR1 = [ 'startrule', [ [ 'cvtype=\'problem\'' ], 'and', [ [ 'problem_description match \'*\'' ], 'and', [ [ [ 'problem_synopsis match \'FCSIM\'' ], 'or', [ 'problem_synopsis match \'ATTE\'' ] ], 'or', 'not', [ [ 'create_time>time', [ '\'06/01/2014 0:00:00\'' ] ], 'and', [ 'create_time<time', [ '\'09/30/2014 0:00:00\'' ] ] ] ] ] ], $VAR1 ];

      My only complaint is that 'not' and 'and/or' are on the same level. I really want 'not' to be of higher precedence than 'and/or'. Hopefully making the output look like this:

      result:$VAR1 = [ 'startrule', [ [ 'cvtype=\'problem\'' ], 'and', [ [ 'problem_description match \'*\'' ], 'and', [ [ [ 'problem_synopsis match \'FCSIM\'' ], 'or', [ 'problem_synopsis match \'ATTE\'' ] ], 'or', [ [ 'not', ] [ [ 'create_time>time', [ '\'06/01/2014 0:00:00\'' ] ], 'and', [ 'create_time<time', [ '\'09/30/2014 0:00:00\'' ] ] ] ] ] $VAR1 ];

      I have tried to make a callback to expr. But that cuts the run short. Any help I can get would be greatly appreciated.

      DevM