jeffa has asked for the wisdom of the Perl Monks concerning the following question:

I have been working on this one for several hours now and need a little help. :) I need to parse 'robot instructions' and execute them. The format of my language is:

label  opcode  operand operand ... operand
where the label is optional. So far i have implemented all 'opcodes' except 'IF', and that is what is giving me trouble. But first, let me explain the code.

My idea is to parse each line and add a sub reference (plus the args to the sub via an array ref) to some global array. After the parsing has completed, i then run a loop over that array and call each element as a subroutine. The reason for using two passes is to be able to implement a 'GOTO' command. The 'robot command language' also branches when an 'IF' command is encountered.

Back to the problem. My grammar works except in this one example case:

IF int_or_var relop int_or_var label_name
when P::RD sees that the first 'int_or_var' is a 'var' and not an 'int', it mistakes 'IF' for a label. In other words, the first line below matches correctly to 'IF', but the second matches "incorrectly" to 'label':

IF 9 > 8 L1       # matches IF
IF rmove < 8 L1   # matches label
My question is, how do i specify my grammar to get the second line above to match to 'label' and not 'if'? Here is my code, it should run without any errors and it's display should be self-explanatory. Also, any suggestions or improvements are very welcome. Thanks in advance. :)
use strict; use warnings; use Data::Dumper; use Parse::RecDescent; $RD_HINT = 1; use vars qw( %item @item $return %symbol %axis @code $curr %label ); @axis{'X','Y','Z'} = (0,0,1); @code = (); my $parser = Parse::RecDescent->new(q( startrule : opcode | label opcode label: /[a-z]\w*/i { print "got label: $item[1]\n"; $main::label{$item[1]} = $#main::code; } opcode: IF | YMOVE | XMOVE | DOWN | UP | PRINTLOC | GOTO | SET | ADD | SUB | HALT YMOVE: /YMOVE/i int_or_var { push @main::code, [ sub { $main::axis{Y} += $_[0] }, [$item{int_or_var}], ]; } XMOVE: /XMOVE/i int_or_var { push @main::code, [ sub { $main::axis{X} += $_[0] }, [$item{int_or_var}], ]; } DOWN: /DOWN/i { push @main::code, [ sub { $main::axis{Z}-- },[]]; } UP: /UP/i { push @main::code, [ sub { $main::axis{Z}++ },[]]; } PRINTLOC: /PRINTLOC/i { push @main::code, [ sub { print "@{[map $main::axis{$_},sort keys %main::axis]}\n +" }, [], ]; } IF: /IF/i int_or_var relop int_or_var label_name { print "got if: @item\n"; } GOTO: /GOTO/i label_name { push @main::code, [ sub { $main::curr = $main::label{$item{label_name}} },[] ]; } SET: /SET/i var_name int_or_var { $main::symbol{$item{var_name}} = $item{int_or_var}; push @main::code, [ sub { $main::symbol{$_[0]} = $_[1] }, [$item{var_name}, $item{int_or_var}], ]; } ADD: /ADD/i var_name int_or_var { $main::symbol{$item{var_name}} += $item{int_or_var}; push @main::code, [ sub { $main::symbol{$_[0]} += $_[1] }, [$item{var_name}, $item{int_or_var}], ]; } SUB: /SUB/i var_name int_or_var { $main::symbol{$item{var_name}} -= $item{int_or_var}; push @main::code, [ sub { $main::symbol{$_[0]} -= $_[1] }, [$item{var_name}, $item{int_or_var}], ]; } HALT: /HALT/i { push @main::code, [ sub { exit(0); },[]]; } label_name: /[a-z]\w*/i int_or_var: var_value | int var_name: /[a-z]\w*/i var_value: /[a-z]\w*/i { $return = $main::symbol{$item[1]}; } int: /-*\d+/ relop: '<' | '>' | '==' | '!=' | '>=' | '<=' )); $parser->startrule($_) while <DATA>; for ($curr = 0; $curr < @code; $curr++) { my ($sub,$args) = @{$code[$curr]}; $sub->(@$args); } __DATA__ YMOVE 10 # along y-axis SET RMOVE 0 # sets RMOVE to 0 L1 ADD RMOVE 1 # adds 1 to RMOVE xmove 1 # along x-axis IF rmove < 8 L1 IF 9 < 8 L1 YMOVE -3 down # along z-axis (lower gripper) PRINTLOC HALT

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re: Help with tweaking Parse::RecDescent grammar
by PodMaster (Abbot) on Feb 11, 2003 at 01:30 UTC
    After setting $::RD_TRACE=1 and looking at the trace for a few minutes, I added the line    IF RMOVE > 0 L1 under data, right before your original    IF rmove < 8 L1 and I got
    got label: L1 got if: IF IF 1 > 0 L1 got label: IF got if: IF IF 9 < 8 L1 1 7 0
    which would indicate that variable names are case sensitive. You might wanna do a $main::symbol{lc $item{var_name}} from now on ;)

     
    Ain't nothin' like a second set of peepers ;)(in case you lose the first pair)

    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    ** The Third rule of perl club is a statement of fact: pod is sexy.

Re: Help with tweaking Parse::RecDescent grammar
by tall_man (Parson) on Feb 11, 2003 at 03:43 UTC
    Most computer languages use the concept of reserved words, and in this case I think it would simplify your grammar a lot if labels and opcodes were distinct. Here is one way to do it that I adapted from an answer in this mailing list thread (I haven't tried it).
    opcode: IF | YMOVE | XMOVE | DOWN | UP | PRINTLOC | GOTO | SET | ADD | SUB | HALT label: ...!opcode /[a-z]\w*/i
Re: Help with tweaking Parse::RecDescent grammar
by extremely (Priest) on Feb 11, 2003 at 05:08 UTC
    You could disambiguate labels with a marker character. Just like in the P::RD code itself, L1: ADD RMOVE 1

    Thus labels are terminated with a special character and your grammar never gets confused... label_name: /[a-z]\w*:/i

    --
    $you = new YOU;
    honk() if $you->love(perl)

      That doesn't solve his problem though. The problem isn't that IF matches a label, the problem is that the parser is trying to match IF to be a label. That means the parser has already rejected the line to be of the form opcode operand(s).

      Hence, the parsing of the IF statement failed. Where does it fail? Easy, var_value returns the value from a code block. It's looking up the value of rmove, but that variable wasn't set. Instead, the variable RMOVE was set.

      The problem here is that the parser is already doing the calculations, and causing the parse the language differently on the outcome of those calculations. Icky.

      Anything more complicated that a calculater should probably have separate compile and runtime phases.

      Abigail

      Good idea, but this is for homework! ;) There is a Python course being taught this semester by one of my favorite professors. This is their second project, and my first thought upon seeing it (other than they should be learning Perl instead) was to attack the problem with P::RD. At first, PodMaster's suggestion worked, then i changed my code and i am still having a hard a time matching that darned 'IF'. Welcome back man, it's been too long. ;)

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
Re: Help with tweaking Parse::RecDescent grammar
by castaway (Parson) on Feb 11, 2003 at 15:02 UTC
    Abigail is correct.
    Essentially you have to remember that any action which results in 'undef' will fail the entire rule. The action in var_value seems to be setting $return to 'undef', which fails the IF rule, which is why it is thought to be a label.
    (sticking a '1;' after the '$result = line', corrects the result, but not the actual problem, which will only parse correct code (ie wenn the variable has been created before it gets to the IF statement)

    C.

(jeffa) Re: Help with tweaking Parse::RecDescent grammar
by jeffa (Bishop) on Feb 11, 2003 at 16:24 UTC

    Ok, finally got it! I'd like to thank (in no particular order) PodMaster, castaway, bart, Abigail-II, merlyn (supersearch++), demerphq, tall_man, and extremely for the help. For the record, PodMaster got my code working, but it was castaway that first spotted the undef problem with $return. Abigail-II hit the nail on the head about the double calc/parsing that i was doing: ICK! ;)

    For those interested, here is the revised code in working order. I opted to eval strings during runtime instead of calling anonymous subroutines. The reason is because i need to use the values of particular vars at runtime, not compile time, and eval'ing strings seems to me to be a better fit for that.

    And here are some sample input files to test out:

    UPDATE:
    Thanks for the bullet-proofing, demerphq. That regular expression for relop is very nice. :)

    thanks again :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Hi Jeffa sorry I wasnt able to help out more when we spoke in the cb. I finally had some time this evening and pulled the code off your scratch pad. I havent looked at your latest solution i thought this way would be more educational for both of us. :-)

      I found a couple of issues with the grammar besides the return undef problem. In a couple of cases you wrote the grammar as though it would be handled the same way that yacc would. For instance,

      relop : '<' | '<='
      This is bad because because it will cause 'foo <= 10' to be parsed as 'foo < = 10'. Replacing it by a single regex to match the appropriate options is better here (and faster).

      Another one was the way you handled the label. If the opcode part following a label fails the parse, the label is still assigned. A similar (and very common mistake) occurs in your startrule. Your grammar can match a sentence that starts off valid and then degenerates. Perhaps this is the behaviour you wanted, but im guessing you werent aware. (Incidentally I dont recall that its well documented but the startrule doesnt have to be called that, you can call $parser->any_defined_rule($foo) and have it start parsing from that rule. Which can be useful for debugging a particular rule, but also makes using the parser look a little nicer.)

      If you uncomment the various rules and comment out their complements then youll see the failures occur (you really should trap to see if the parse failed :-). I also cleaned things up a bit. Now a single rule 'identifier' fills in for all the places where label_name and var_name were being used.

      Anyay, interesting stuff, I kinda wonder about some things in it, I dont understand why the variables in if are evaluated at compile time and not run time. Also why there isnt more checking of values in the label and symbol table. Im kinda interested now to review your latest post. :-)

      Cheers, I enjoy hacking around with P::RD. I wish I had more cause to use it in anger. :-)

      Update: Ok I read the latest code, and I like the improvements. You resolved at least a couple of the the issues I mention above. But some are still open and waiting to bite you.

      Goodnight and HTH

      --- demerphq
      my friends call me, usually because I'm late....