Parse::RecDescent and mini-language parsing

Flame has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
•Re: Parse::RecDescent and mini-language parsing by merlyn (Sage) on Mar 30, 2003 at 02:44 UTC
While you'll very likely get some good responses to this at the Monastery, you'll get to a more select audience (including TheDamian) if you post this question to the P::RD mailing list. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: •Re: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Mar 30, 2003 at 03:09 UTC
Thanks, I didn't know there was one. Just started looking at the docs this morning. My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply]
•Re: Parse::RecDescent and mini-language parsing by merlyn (Sage) on Mar 30, 2003 at 17:45 UTC
OK, just taking a quick untested whack at your original language. Presuming you want and/or/xor to be at the same precedence level, I'd code it like this: `expression: <leftop: term termop term> termop: 'and' \| 'or' \| 'xor' term: '(' expression ')' \| condition condition: field comparison value field: '<' timethingy '>' timethingy: 'DAY' \| 'WEEK' comparison: '<=' \| '<' \| '=' \| '>=' \| '>' \| '!=' value: /\d+/` [download] That'll probably get you mostly started. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: •Re: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Mar 30, 2003 at 23:50 UTC
Wow, this certainly seems more efficient, I'll start messing around with it and see what I can come up with. Thanks! My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply]
•Re: Re: •Re: Parse::RecDescent and mini-language parsing by merlyn (Sage) on Mar 31, 2003 at 00:39 UTC
And if you wanted 'and' to be higher precedence than 'or' and 'xor', so as not to confuse the rest of us: `expression: <leftop: term termop term> termop: 'or' \| 'xor' term: <leftop: factor factorop factor> factorop: 'and' factor: '(' expression ')' \| condition condition: field comparison value field: '<' timethingy '>' timethingy: 'DAY' \| 'WEEK' comparison: '<=' \| '<' \| '=' \| '>=' \| '>' \| '!=' value: /\d+/` [download] -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: •Re: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Mar 31, 2003 at 05:26 UTC
Note: Before I start, I have noticed your other suggestion, however this is based off of this code. Well, I've modified your suggestion a little, and assuming I understood your suggestion's intent, I seem to have screwed it up somewhere along the line, because it still does not handle ( ) properly, but I can't see why. This is what I now have: use Parse::RecDescent; use Data::Dumper; use strict; use warnings; $::RD_TRACE = 1; my $grammar = q~ expression: <leftop: term termop term> eod termop: /and/i \| /xor/i \| /or/i term: '(' <commit> expression ')' \| condition condition: element comparison element element: '<' <commit> /-?\w+/ '>' \| /\d+/ comparison: /=[><]=/ <commit> <error: Unable to match comparison> \| /=?[><]=?/ \| '=' \| '!=' eod: /^\Z/ ~; my $parser = new Parse::RecDescent($grammar) or die; #defined($parser->RecTest('h =>= h')) or die; my $test = '(<MONTH> => <DAY>)'; # or (<MONTH> = <MARCH> and <DAY> = < +TUESDAY>)'; print Dumper($parser->expression($test)); [download] I repeatedly get "$VAR1 = undef;" in reply, as well as enough debugging info to scroll off my screen (unfortunately I can't capture it into a file either, Parse::RecDescent overrides any re-declarations on STDERR, and I can't use "perl language.pl \| more" or any other similar methods since those don't want to capture the error messages...) Anyone see what I'm missing, it's probably obvious, but I've been staring at it for 4 hours now without anything more constructive happening than the condensing of a few rules in the grammar declaration. My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply] [d/l]
Re: Re: •Re: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Mar 31, 2003 at 20:01 UTC
Flame feels rediculously stupid. Ok, found THAT problem. The "eod" rule at the end of "expression", was being called upon whenever expression was used, meaning that the ')' had to somehow come after the end of the data. With that fixed, I need to do a few more tests, but this is how it looks NOW: `my $grammar = q~ logic: expression eod expression: <leftop: term termop term> termop: /and/i \| /xor/i \| /or/i term: '(' <commit> expression ')' \| condition condition: element comparison element element: '<' <commit> /-?\w+/ '>' \| /\d+/ comparison: /=[><]=/ <commit> <error: Unable to match comparison> \| /=?[><]=?/ \| '=' \| '!=' eod: /^\Z/ ~;` [download] My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply] [d/l]
Re: Parse::RecDescent and mini-language parsing by castaway (Parson) on Mar 30, 2003 at 17:14 UTC
Your expressions and subexpressions look a little too complicated to me. Try setting the two variables : `$RD_TRACE = 1; $RD_HINT = 1;` [download] And watch what happens. The ')' that it prints is the result of the first set of brackets (if you dont create any actions, it returns the last thing that successfully matched.) It threw the rest away as it couldnt match it. Heres a snippet of something that I was using to evaluate expressions, maybe it helps: condition: booleanexpr \| '&&' booleanexpr \| '\|\|' booleanexpr booleanexpr: '!' booleanexpr { $return = $item[1] . $item[2]; } \| '(' booleanexpr ')' { $return = $item[1] . $item[2] . $item[3]; } \| Value_op comparator Value_op { $return = $item[1] . $item[2] . $item[3]; } comparator: '==' \| '<' \| '>' \| '<=' \| '>=' \| '!=' Value_op: operation \| Value Value: String \| Number \| Float operation: operator plusminus(?) { my $op; $op = ''; if(@{$item[2]}) { $op = $item[2]->[0]; } $return = $item[1] . $op; 1; } plusminus: '+' operator { $return = $item[1] . $item[2]; } \| '-' operator { $return = $item[1] . $item[2]; } operator: ('*' Value \| '/' Value) \| '(' operation ')' \| Value [download] C.	[reply] [d/l] [select]
Re: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Mar 30, 2003 at 05:46 UTC
Cleaning up a little: COMPARE rule now reads: `COMPARE: /(?!=[><]=)=?[><]=?/ \| '!=' \| '='` [download] My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply] [d/l]
Solution: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Apr 04, 2003 at 21:18 UTC
Well, just so everyone knows, I finally found the solution: my $grammar = q~ #This is what I actually use $parser->logic($string); logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i \| /xor/i \| /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing \| condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set \| /\d+/ # num is automatically returned comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, + $1> \| /=?[><]=?/ \| '=' \| '!=' eod: /^\Z/ ~; [download] My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply] [d/l]
•Re: Solution: Parse::RecDescent and mini-language parsing by merlyn (Sage) on Apr 04, 2003 at 23:52 UTC
I still don't get your code here: `comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, $1 +> \| /=?[><]=?/ \| '=' \| '!='` [download] If you did it the way I said, it's simpler and more direct: `comparison: '<=' \| '<' \| '=' \| '>=' \| '>' \| '!='` [download] The matches are executed from left to right, so as long as you specify the longer match first, it does the right thing. Your code seems clumsy. Also, I really don't see the need for those commits. Again, they clutter up the grammar. Have you tried removing them? -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l] [select]
Re: •Re: Solution: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Apr 05, 2003 at 02:23 UTC
Ok, I ran the benchmark, and it seems the regex form is faster. I have tested it several times with the following code and results: use Parse::RecDescent; use Date::Calc qw(:all); use Benchmark ':all'; use strict; use warnings; my $grammar1 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i \| /xor/i \| /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing \| condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set \| /\d+/ # num is automatically returned comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, + $1> \| /=?[><]=?/ \| '=' \| '!=' eod: /^\Z/ ~; my $grammar2 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i \| /xor/i \| /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing \| condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set \| /\d+/ # num is automatically returned comparison: '<=' \| '<' \| '=' \| '>=' \| '>' \| '!=' eod: /^\Z/ ~; my $parser1 = new Parse::RecDescent($grammar1) or die; my $parser2 = new Parse::RecDescent($grammar2) or die; my $test = '<DAY> = 4 or <DAY> > 4 or <DAY> < 4 or <DAY> >= 4 or <DAY> + <= 4 or <DAY> != 4'; cmpthese(10000,{ 'regex' => sub { $parser1->logic($test); }, 'quote' => sub { $parser2->logic($test); }, }); [download] Yielded: `Rate quote regex quote 116/s -- -5% regex 123/s 6% --` [download] This one added the =>= which I wanted to avoid, and while it slowed down both slightly, the regex was still in the lead. use Parse::RecDescent; use Date::Calc qw(:all); use Benchmark ':all'; use strict; use warnings; my $grammar1 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i \| /xor/i \| /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing \| condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set \| /\d+/ # num is automatically returned comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, + $1> \| /=?[><]=?/ \| '=' \| '!=' eod: /^\Z/ ~; my $grammar2 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i \| /xor/i \| /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing \| condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set \| /\d+/ # num is automatically returned comparison: '<=' \| '<' \| '=' \| '>=' \| '>' \| '!=' eod: /^\Z/ ~; my $parser1 = new Parse::RecDescent($grammar1) or die; my $parser2 = new Parse::RecDescent($grammar2) or die; my $test = '<DAY> = 4 or <DAY> > 4 or <DAY> < 4 or <DAY> >= 4 or <DAY> + <= 4 or <DAY> != 4 or <DAY> =<= 4'; cmpthese(10000,{ 'regex' => sub { $parser1->logic($test); }, 'quote' => sub { $parser2->logic($test); }, }); [download] Yielded: `Rate quote regex quote 104/s -- -7% regex 113/s 8% --` [download] I admit, neither is as fast as I would like, but it certainly appears that the regex is the fastest method there, unless I made a mistake. Edit: If you feel this was biased in any way, feel free to suggest another string, or another test altogether. I don't use the benchmark module frequently, and I may have inadevertently allowed for some bias. My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply] [d/l] [select]
•Re: Re: •Re: Solution: Parse::RecDescent and mini-language parsing by merlyn (Sage) on Apr 05, 2003 at 04:17 UTC
Re: •Re: Solution: Parse::RecDescent and mini-language parsing by Flame (Deacon) on Apr 05, 2003 at 01:21 UTC
Valid points both, perhaps I should do a benchmark and see which of our methods is faster. It was my goal to decrease the number of or's in the production, but perhaps regex's aren't the way to go. As for the nature of the first line, I wanted to be sure that no one ever wrote =>= to simplify later processing, though looking at it now I don't think it would make much of a difference, but it does allow me to specify what the error was. As for the comments, this is part of a school project, and while the comments do need to be cleaned up, they will help me to explain in a hurry the purpose of each elment to the person reviewing it. I'll look into those benchmarks and, if I remember, post them here soon... too much to do, too little time... My code doesn't have bugs, it just develops random features. Flame ~ Lead Programmer: GMS (DOWN) \| GMS (DOWN)	[reply]