mush4brains has asked for the wisdom of the Perl Monks concerning the following question:

In a RecDescent grammar, I'd like to add support in my language for "macros".
These macros would be very much like #define macros in C and would support arguments that are instantiated into the expansion. In C, this looks something like:
#define add(a,b) (a+b)
In my language, I'd like something like:
myMacro(x,y): any_expression
where "any_expression" is any old (deep) expression defined elsewhere in the grammar, and (x,y) would be replaced/instantiated in that expression.

This may seem straighforward, but, well, it ain't to me. (I am, after all, mush4brains.)
In my P::RD grammar, the "macro" rule without arguments would look like:
macro: identifier ':' expr
where, again, expr is defined fully in the grammar. Adding arguments to this... I need to instantiate my (x,y) into the "expr" string before the subrules interpret it.

Now... I thought of using standard Perl substitution in a pre-scan, but this seems less-than-ideal, since it requires some knowledge of the grammar in the Perl REs.

I thought of using two separate grammars, the first of which is a simplified version of the full second grammar, but this also seems a bit redundant.

Then I thought... "This hurts".
And then... "Hey, I'll ask the Monks!!!"

So... what am I missing here? Something obvious, I imagine?

Thanks for any help.
Jim Wiltshire

Replies are listed 'Best First'.
•Re: Macros in a RecDescent grammar
by merlyn (Sage) on Jun 21, 2002 at 21:29 UTC
    You can edit $text in a rule to achieve a "pass-and-a-half" parsing. When you recognize a macrodef, store the key into a hash, with the value being the replacement string. Then, whenever you are looking for an identifier, also look first to see if it's one of the keys, and if so, prepend $text with the corresponding value and accept it.

    I've been thinking about writing a column on that, but haven't finished. I saw a good example of that in theDamian's class notes though.

    -- Randal L. Schwartz, Perl hacker


    update: The file demo_cpp.pl in the Parse::RecDescent distro has a good example of mangling $text.
      I'm currently doing something like that, though I'm not actually updating $text:
      macro: identifier ":" expr { if (defined $_macro{$item[2]->getVal()}) { # ERROR -- duplicate macro definitions } else { $_macro{$item[2]->getVal()} = $item[4]; $return = $item[2]; } }
      Rather than update $text, I build my op-tree branch directly from expr. (And it's working so far...)
      Adding arg's to this seems harder, since the instantiation-points can be many subrules deeper.
      - Jim
      Macros will have to be defined prior to use then though, unless I'm a mile off on how this'd work. (Not that that's bad, but it would differ from a two pass compiler in this respect and might be a point to consider.)

      Makeshifts last the longest.

      (Merlyn, I look forward to seeing your column on this, by the way. - Jim)
Re: Macros in a RecDescent grammar
by dws (Chancellor) on Jun 21, 2002 at 21:08 UTC
    In a RecDescent grammar, I'd like to add support in my language for "macros".

    Macros are most often implemented as a separate grammar, and macro processing is done by a separate logical process done prior to parsing (possibly even prior to lexical analysis). Trying to mix macros into your primary grammar is going to be an uphill battle. The received wisdom is "keep the two separate", though you might learning something by trying.

Re: Macros in a RecDescent grammar
by kvale (Monsignor) on Jun 21, 2002 at 21:04 UTC
    You can think of a macro as a procedure in which the body of the procedure is substituted whenever invoked. Thus invoking macros entails changing the source code as it is being parsed. Expanding macros at the same time you parse code is a tricky proposition; look at Tex or Tcl to see some of the issues involved. Generally it is easier to make your compiler two-pass, one for macro expansion and one for parsing the resulting code.

    -Mark
      Thanks, Mark.
      I suspect two passes is somehow necessary, one way or the other, but I thought maybe I was missing one of the deeper features of the amazing P::RD.
      - Jim
Re: Macros in a RecDescent grammar
by Anonymous Monk on Jun 22, 2002 at 16:19 UTC
    Damian Conway answered a similar question in a talk last week. (Inserting optional semi-colons and continuing to parse.)

    When you recognize the macro, modify the text to replace the macro with what should be there, throw an exception, and then retry the parse.

    Inefficient? Well you are using Parse::RecDescent so you obviously cannot be worried about that!

Re: Macros in a RecDescent grammar
by mush4brains (Acolyte) on Jun 25, 2002 at 13:30 UTC
    The two-step seems the best (and only?) option here ...
    As an exercise, I added basic macro argument-handling to Damian's demo_cpp.pl, included in Parse-RecDescent.
    In demo_cpp.pl, change the "demacro" function to:
    sub demacro($) { my $text = shift; while (($macro,$pkg) = each %macro ) { my $defn = $$pkg[0]; if (defined $$pkg[1]) { my @args = @{$$pkg[1]}; while ($text =~ m/($macro\((.*?)\))/g) { my $oA = length($`), $oZ = length($&); my @vals = split /\s*,\s*/, $2; (@args eq @vals) or die "Bad arg count for macro '$macro'\n"; my $_defn = $defn; for ($i=0; $i<@args; $i++) { $_defn =~ s/$args[$i]/($vals[$i])/g; } substr($text, $oA, $oZ) = $_defn; } } else { $text =~ s/$macro/$defn/; } } return $text; }
    And change the "macrodef" rule to:
    macrodef : '#define' /[a-z]\w*/i '(' <leftop: /[a-z]+/i ',' /[a-z]+/i> ')' /.*/ { $::macro{$item[2]} = [ $item[-1], $item[4] ]; } | '#define' /[a-z]\w*/i /.*/ { $::macro{$item[2]} = [ $item[-1] ]; }
    This would process #define's like in:
    #define add(a,b,c) (a+b+c) #define mult(d,e) (d*e) #define pi (3.1415) main( ) { add(1,2,3); mult(4,5); float p = pi; }
    Note that this is no less "naive" than the original demo -- simple string substitution is used for argument instantiation (though this does add parens around each arg instance to preserve operator precedence). E.g., using args "a" and "ab" would fail since the "a"-instantiation would intrude on part of the "ab". A real algorithm would need to rectify this (with smart tokenization, etc.).

    Thanks for the help.
    Jim