Flame has asked for the wisdom of the Perl Monks concerning the following question:

My Fellow Monks,

I am attempting to create a mini-language to allow users to set when certain events are marked on a calendar. (This is primarally used for repeated events... eg every thursday kinda things). I have come up with the base concept for what I want the language to look like, but I'm having difficulty coming up with something and then parsing it.

Samples:

I'm attempting to parse this with Parse::RecDescent, and I think I'm on the right track, but it's behaving funny (or I don't know how to use it properly.)

The following is my test code (variable grammar is incomplete, everything is compared to 'h'):

use Parse::RecDescent; use strict; use warnings; my $grammar = qq~ RecTest: ExpSet(s) ( Expression | SubExp ) | Expression | SubExp | <error> Expression: ELEMENT COMPARE ELEMENT ExpSet: ( Expression | SubExp ) JOIN SubExp: '(' (ExpSet(s) Expression | Expression) ')' ELEMENT: 'h' COMPARE: /(?:=(?![><]=))?[><]=?/ | '!=' | '=' JOIN: /and/i | /or/i | /xor/i ~; my $parser = new Parse::RecDescent($grammar) or die; print $parser->RecTest('(h = h and h != h) and ((h > h or h < h') or d +ie;

When I run this, it prints out ')', which is not entirely useful but implies that it is succeeding, I think... The problem is, it should NOT be succeeding with those arguments because the parentheses are unbalanced, so I assume I made some mistake, but I don't really understand the syntax well enough to know where. Also, I'm having a hard time figuring out how to do anything with what it attempts to parse (so I can actually evaluate what they want and determine if it occurs on any given day).

Perhaps I'm going about this wrong, if so, I'm open to other ideas. Thanks for any help or advice you can give.

Note: This is my first time even seriously looking at Parse::RecDescent.



My code doesn't have bugs, it just develops random features.

Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

Replies are listed 'Best First'.
•Re: Parse::RecDescent and mini-language parsing
by merlyn (Sage) on Mar 30, 2003 at 02:44 UTC
      Thanks, I didn't know there was one. Just started looking at the docs this morning.



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

•Re: Parse::RecDescent and mini-language parsing
by merlyn (Sage) on Mar 30, 2003 at 17:45 UTC
    OK, just taking a quick untested whack at your original language. Presuming you want and/or/xor to be at the same precedence level, I'd code it like this:
    expression: <leftop: term termop term> termop: 'and' | 'or' | 'xor' term: '(' expression ')' | condition condition: field comparison value field: '<' timethingy '>' timethingy: 'DAY' | 'WEEK' comparison: '<=' | '<' | '=' | '>=' | '>' | '!=' value: /\d+/
    That'll probably get you mostly started.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Wow, this certainly seems more efficient, I'll start messing around with it and see what I can come up with. Thanks!



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

        And if you wanted 'and' to be higher precedence than 'or' and 'xor', so as not to confuse the rest of us:
        expression: <leftop: term termop term> termop: 'or' | 'xor' term: <leftop: factor factorop factor> factorop: 'and' factor: '(' expression ')' | condition condition: field comparison value field: '<' timethingy '>' timethingy: 'DAY' | 'WEEK' comparison: '<=' | '<' | '=' | '>=' | '>' | '!=' value: /\d+/

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

      Note: Before I start, I have noticed your other suggestion, however this is based off of this code.

      Well, I've modified your suggestion a little, and assuming I understood your suggestion's intent, I seem to have screwed it up somewhere along the line, because it still does not handle ( ) properly, but I can't see why. This is what I now have:

      use Parse::RecDescent; use Data::Dumper; use strict; use warnings; $::RD_TRACE = 1; my $grammar = q~ expression: <leftop: term termop term> eod termop: /and/i | /xor/i | /or/i term: '(' <commit> expression ')' | condition condition: element comparison element element: '<' <commit> /-?\w+/ '>' | /\d+/ comparison: /=[><]=/ <commit> <error: Unable to match comparison> | /=?[><]=?/ | '=' | '!=' eod: /^\Z/ ~; my $parser = new Parse::RecDescent($grammar) or die; #defined($parser->RecTest('h =>= h')) or die; my $test = '(<MONTH> => <DAY>)'; # or (<MONTH> = <MARCH> and <DAY> = < +TUESDAY>)'; print Dumper($parser->expression($test));

      I repeatedly get "$VAR1 = undef;" in reply, as well as enough debugging info to scroll off my screen (unfortunately I can't capture it into a file either, Parse::RecDescent overrides any re-declarations on STDERR, and I can't use "perl language.pl | more" or any other similar methods since those don't want to capture the error messages...) Anyone see what I'm missing, it's probably obvious, but I've been staring at it for 4 hours now without anything more constructive happening than the condensing of a few rules in the grammar declaration.





      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

        Flame feels rediculously stupid.

        Ok, found THAT problem. The "eod" rule at the end of "expression", was being called upon whenever expression was used, meaning that the ')' had to somehow come after the end of the data. With that fixed, I need to do a few more tests, but this is how it looks NOW:

        my $grammar = q~ logic: expression eod expression: <leftop: term termop term> termop: /and/i | /xor/i | /or/i term: '(' <commit> expression ')' | condition condition: element comparison element element: '<' <commit> /-?\w+/ '>' | /\d+/ comparison: /=[><]=/ <commit> <error: Unable to match comparison> | /=?[><]=?/ | '=' | '!=' eod: /^\Z/ ~;




        My code doesn't have bugs, it just develops random features.

        Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

Re: Parse::RecDescent and mini-language parsing
by castaway (Parson) on Mar 30, 2003 at 17:14 UTC
    Your expressions and subexpressions look a little too complicated to me. Try setting the two variables :
    $RD_TRACE = 1; $RD_HINT = 1;
    And watch what happens. The ')' that it prints is the result of the first set of brackets (if you dont create any actions, it returns the last thing that successfully matched.) It threw the rest away as it couldnt match it.
    Heres a snippet of something that I was using to evaluate expressions, maybe it helps:
    condition: booleanexpr | '&&' booleanexpr | '||' booleanexpr booleanexpr: '!' booleanexpr { $return = $item[1] . $item[2]; } | '(' booleanexpr ')' { $return = $item[1] . $item[2] . $item[3]; } | Value_op comparator Value_op { $return = $item[1] . $item[2] . $item[3]; } comparator: '==' | '<' | '>' | '<=' | '>=' | '!=' Value_op: operation | Value Value: String | Number | Float operation: operator plusminus(?) { my $op; $op = ''; if(@{$item[2]}) { $op = $item[2]->[0]; } $return = $item[1] . $op; 1; } plusminus: '+' operator { $return = $item[1] . $item[2]; } | '-' operator { $return = $item[1] . $item[2]; } operator: ('*' Value | '/' Value) | '(' operation ')' | Value
    C.
Re: Parse::RecDescent and mini-language parsing
by Flame (Deacon) on Mar 30, 2003 at 05:46 UTC
    Cleaning up a little: COMPARE rule now reads:
    COMPARE: /(?!=[><]=)=?[><]=?/ | '!=' | '='




    My code doesn't have bugs, it just develops random features.

    Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

Solution: Parse::RecDescent and mini-language parsing
by Flame (Deacon) on Apr 04, 2003 at 21:18 UTC
    Well, just so everyone knows, I finally found the solution:
    my $grammar = q~ #This is what I actually *use* $parser->logic($string); logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i | /xor/i | /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing | condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set | /\d+/ # num is automatically returned comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, + $1> | /=?[><]=?/ | '=' | '!=' eod: /^\Z/ ~;




    My code doesn't have bugs, it just develops random features.

    Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

      I still don't get your code here:
      comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, $1 +> | /=?[><]=?/ | '=' | '!='
      If you did it the way I said, it's simpler and more direct:
      comparison: '<=' | '<' | '=' | '>=' | '>' | '!='
      The matches are executed from left to right, so as long as you specify the longer match first, it does the right thing. Your code seems clumsy.

      Also, I really don't see the need for those commits. Again, they clutter up the grammar. Have you tried removing them?

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        Ok, I ran the benchmark, and it seems the regex form is faster. I have tested it several times with the following code and results:

        use Parse::RecDescent; use Date::Calc qw(:all); use Benchmark ':all'; use strict; use warnings; my $grammar1 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i | /xor/i | /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing | condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set | /\d+/ # num is automatically returned comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, + $1> | /=?[><]=?/ | '=' | '!=' eod: /^\Z/ ~; my $grammar2 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i | /xor/i | /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing | condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set | /\d+/ # num is automatically returned comparison: '<=' | '<' | '=' | '>=' | '>' | '!=' eod: /^\Z/ ~; my $parser1 = new Parse::RecDescent($grammar1) or die; my $parser2 = new Parse::RecDescent($grammar2) or die; my $test = '<DAY> = 4 or <DAY> > 4 or <DAY> < 4 or <DAY> >= 4 or <DAY> + <= 4 or <DAY> != 4'; cmpthese(10000,{ 'regex' => sub { $parser1->logic($test); }, 'quote' => sub { $parser2->logic($test); }, });

        Yielded:

        Rate quote regex quote 116/s -- -5% regex 123/s 6% --

        This one added the =>= which I wanted to avoid, and while it slowed down both slightly, the regex was still in the lead.

        use Parse::RecDescent; use Date::Calc qw(:all); use Benchmark ':all'; use strict; use warnings; my $grammar1 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i | /xor/i | /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing | condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set | /\d+/ # num is automatically returned comparison: /(=[><]=)/ <commit> <error: Unable to match comparison, + $1> | /=?[><]=?/ | '=' | '!=' eod: /^\Z/ ~; my $grammar2 = q~ logic: expression eod { $return = $item[1]; } expression: <leftop: term termop term> termop: /and/i | /xor/i | /or/i term: '(' <commit> expression ')' { $return = $item[3]; } #[@item[1,3,4]]; } # Only include eleme +nts important to later processing | condition condition: element comparison element { $return = main::process(@item[1..3]); } element: '<' <commit> /-?\w+/ '>' { $return = "<$item[3]>"; } #Return this so that the conditio +n value can be set | /\d+/ # num is automatically returned comparison: '<=' | '<' | '=' | '>=' | '>' | '!=' eod: /^\Z/ ~; my $parser1 = new Parse::RecDescent($grammar1) or die; my $parser2 = new Parse::RecDescent($grammar2) or die; my $test = '<DAY> = 4 or <DAY> > 4 or <DAY> < 4 or <DAY> >= 4 or <DAY> + <= 4 or <DAY> != 4 or <DAY> =<= 4'; cmpthese(10000,{ 'regex' => sub { $parser1->logic($test); }, 'quote' => sub { $parser2->logic($test); }, });

        Yielded:

        Rate quote regex quote 104/s -- -7% regex 113/s 8% --

        I admit, neither is as fast as I would like, but it certainly appears that the regex is the fastest method there, unless I made a mistake.

        Edit: If you feel this was biased in any way, feel free to suggest another string, or another test altogether. I don't use the benchmark module frequently, and I may have inadevertently allowed for some bias.



        My code doesn't have bugs, it just develops random features.

        Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)

        Valid points both, perhaps I should do a benchmark and see which of our methods is faster. It was my goal to decrease the number of or's in the production, but perhaps regex's aren't the way to go. As for the nature of the first line, I wanted to be sure that no one ever wrote =>= to simplify later processing, though looking at it now I don't think it would make much of a difference, but it does allow me to specify what the error was.

        As for the comments, this is part of a school project, and while the comments do need to be cleaned up, they will help me to explain in a hurry the purpose of each elment to the person reviewing it.

        I'll look into those benchmarks and, if I remember, post them here soon... too much to do, too little time...



        My code doesn't have bugs, it just develops random features.

        Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)