After reading Brother hhdave's meditation about Lisp to Perl compilation I started to ponder the wonders of macros and what kinds of source transformations we could do with perl's source filters. The concensus seems to be that source filters are better than the simple text substitutions of CPP, but are lesser than Lisp macros, since Lisp macros transform the code after the parsing stage. So I wanted to see just what could be accomplished with the resources at hand. I figured it would be best not to try to roll my own parser, especially since perl's already got a pretty darn good perl parser. Since I didn't want to delve too far into perl's guts or mess around with op-codes, I decided to take a look at B::Deparse. B::Deparse is code generator which takes a compiled perl program's internal representation and converts it back to perl code. I figured I'd take my macro, compile it, B::Deparse it, and use this representation at a template for matching against a deparsed version of the source code we want to macroify. My first filter employing this concept tries to unroll 3-part for(;;) loops with fewer than 10 iterations. (Warning! This is pre-alpha quality proof-of-concept code.)
package Loop_unroll; use strict; use Filter::Simple; FILTER { my $unroll_limit = 10; #max unrolling depth my ($var1, $cons2, $var3, $cons4, $var5, $body6); #This is the macro replacement pattern we're looking for... my $macro = 'for ($var1=$cons2;$var3<$cons4;$var5++) {$body6}'; #run the macro snippet through B::Deparse (better way to grab outp +ut?) my $compiled_macro=`perl -MO=Deparse,-x9 -e'$macro'`; #screws with + ', FIX ME. #compile the main script my $source = `perl -MO=Deparse,-x9 -e'$_'` if $_; #same proble +m, Bad dog. #These are crude regex to try to match perl variables and integers $var1 = $var3 = $var5 = '(\$[a-zA-Z][a-zA-Z0-9]*)'; $cons2 = $cons4 = '(\d+)'; $body6 = '([^}]+)'; #barf, should be something nicer, no nested bl +ocks #deswizzle variable order (mangled by compile) #variables should be guarenteed to occur only once because of side +-effects my $i; my $reorder; while ($compiled_macro=~/(var1|cons2|var3|cons4|var5|body6)/g) { $reorder.="\$$1 = \$".++$i.";"; # map vars to capturing parens } $compiled_macro=~s/([|(){}\^*+?.\[\]])/\\\\$1/g; #de-meta stuff, n +ot $ (ick) my $regexed_macro; eval '$regexed_macro=qq/'."$compiled_macro".'/'; #interpolate patt +ern match junk $source =~ s{$regexed_macro} { eval $reorder; # better method? ($var1 eq $var3 and $var1 eq $var5 and $cons4 - $cons2 < $unroll_limit) ? "$var1 = $cons2;".("{$body6;} $var1++;") x ($cons4-$cons2) : $&; #Lazy like a fox. }eg; $_ = $source; #print STDERR "$source\n"; }; 1;
And here's a program you can run it against...
#!/usr/bin/perl -w use strict; use Loop_unroll; my $x; for($x=0;$x<8;$x++) { my $f=fact($x); print "factorial($x)=$f\n"; } sub fact { return(1) if ($_[0] <= 0); return $_[0]*fact($_[0]-1); }
My second filter attempts to undo one level of recursion in simple tail-recursive subroutines.
package Recur_unroll; use strict; use Filter::Simple; FILTER { my ($name1, $body2, $name3, $param4, $body5); #This is the macro replacement pattern we're looking for... my $macro = 'sub name1 {$body2; return name3($param4); $body5}'; #run the macro snippet through B::Deparse (better way to grab outp +ut?) my $compiled_macro=`perl -MO=Deparse,-x9 -e'$macro'`; #screws with + ', FIXME. #compile the main script my $source = `perl -MO=Deparse,-x9 -e'$_'` if $_; #same proble +m, Bad dog. #These are crude regex to try to match perlisms $name1 = '\s+([a-zA-Z][a-zA-Z0-9]*)\s+'; $body2 = '(.*?'; $name3 = '[^;]*?(\1'; $param4 = '([^)]*?)'; #no parens allowed inside sub parameter lis +t $body5 = ').*?)'; #deswizzle variable order (possibly mangled by compile) #variables should be guarenteed to occur only once because of side +-effects my $i; my $reorder; while ($compiled_macro=~/(name1|body2|name3|param4|body5)/g) { $reorder.="\$$1 = \$".++$i.";"; # map vars to capturing parens } $compiled_macro=~ s/name/\$name/g; #makes subroutine names var for + interpolation $compiled_macro=~ s/(body\d);/$1/g; #get rid of dummy placeholder +semi's $compiled_macro =~ s/([|(){}\^*+?.\[\]])/\\\\$1/g; #de-meta stuff, + not $ (ick) my $regexed_macro; eval '$regexed_macro=qq/'."$compiled_macro".'/'; #interpolate patt +ern match junk $source =~ s{$regexed_macro} { eval $reorder; $name3=quotemeta($name3); $body2=~s/$name3/eval{local \@_=$param4;$body2 $bo +dy5};/; "sub $name1 { $body2 $body5 }"; }esgx; $_ = $source; #print STDERR "$source\n"; }; 1;
That's not too terribly many lines of code, although its power is limited by the need for your macros to be compilable. Then there is the fact that there is still a certain amount of tinkering necessary to get each case to work correctly. And, of course, the macros I chose to implement are probably of dubious value. Can this approach be generalized/extended? Is it unbearably fragile? If perl5 had perl6-like macros available today, what would you try to accomplish with them? I'll leave those questions as an excersize to the reader :-)

Greg Buchholz

Replies are listed 'Best First'.
Re: Playing with (macro/source-filter) fire
by stvn (Monsignor) on Jan 23, 2004 at 20:01 UTC
    sleepingsquirrel,

    Intruiging idea. I am a fan of LISP style (runtime code expansion) more than C style (text substitution) macros myself. So this node grabbed my attention. But I tried running your code, which worked, but when I added a few print statments to see the guts of it, it looked to me that nothing is really happening.

    Here is the output I got:

    MACRO (NOT-COMPILED): for ($var1=$cons2;$var3<$cons4;$var5++) {$body6} -e syntax OK MACRO (COMPILED & DEPARSED): while ($var3 < $cons4) { $body6; } continue { ++$var5 } SOURCE (NOT-COMPILED): #use Recur_unroll; my $x; for($x=0;$x<8;$x++) { my $f=fact($x); print "factorial($x)=$f\n"; } sub fact { return(1) if ($_[0] <= 0); return $_[0]*fact($_[0]-1); } 1; -e syntax OK SOURCE (COMPILED & DEPARSED): my $x; while ($x < 8) { my $f = fact($x); print "factorial($x)=$f\n"; } continue { ++$x } sub fact { $_[0] <= 0 and return 1; return $_[0] * fact($_[0] - 1); } '???'; REORDER: $var3 = $1;$cons4 = $2;$body6 = $3;$var5 = $4; REG-EXED MACRO: while \((\$[a-zA-Z][a-zA-Z0-9]*) < (\d+)\) \{ ([^}]+); \} continue \{ \+\+(\$[a-zA-Z][a-zA-Z0-9]*) \} FINAL SOURCE: my $x; while ($x < 8) { my $f = fact($x); print "factorial($x)=$f\n"; } continue { ++$x } sub fact { $_[0] <= 0 and return 1; return $_[0] * fact($_[0] - 1); } '???'; MACRO (NOT-COMPILED): for ($var1=$cons2;$var3<$cons4;$var5++) {$body6} -e syntax OK MACRO (COMPILED & DEPARSED): while ($var3 < $cons4) { $body6; } continue { ++$var5 } SOURCE (NOT-COMPILED): SOURCE (COMPILED & DEPARSED): REORDER: $var3 = $1;$cons4 = $2;$body6 = $3;$var5 = $4; REG-EXED MACRO: while \((\$[a-zA-Z][a-zA-Z0-9]*) < (\d+)\) \{ ([^}]+); \} continue \{ \+\+(\$[a-zA-Z][a-zA-Z0-9]*) \} FINAL SOURCE: factorial()=1 factorial(1)=1 factorial(2)=2 factorial(3)=6 factorial(4)=24 factorial(5)=120 factorial(6)=720 factorial(7)=5040
    And here is the altered Loop_unroll module:

    Am I missing something?
    (BTW - I am on Mac OS X (10.3.2) with Perl v5.8.0)

    -stvn
      Hmmm. I don't see what went wrong, but here's the output of the filter I get with use Unroll_loop after uncommenting the last print STDERR $source;
      my $x; $x = 0;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;{my $f = fact($x); print "factorial($x)=$f\n";} $x++;sub fact { $_[0] <= 0 and return 1; return $_[0] * fact($_[0] - 1); }
      And here's the result of Recur_unroll...
      my $x; $x = 0; while ($x < 8) { my $f = fact($x); print "factorial($x)=$f\n"; } continue { ++$x } sub fact { $_[0] <= 0 and return 1; return $_[0] * eval{local @_=$_[0] - 1; $_[0] <= 0 and return 1; return $_[0] * fact($_[0] - 1); }; }

        Still not working for me. I tried downloading your code again (in case it was me), and ran both the Loop and Recurse filters. I have Filter::Simple version 0.79 (i upgraded this too), which uses Text::Balanced which I have version 1.89. Again, Mac OS X 10.3.2 with the default installed Perl 5.8.0. Should I upgrade anything?

        What platform are you on? I mean, the code looks like it should work. I dont see why its not though.

        -stvn
      After running your version of the code on my machine I notice a couple of differences between your results and mine. When you print out the results of the compiled and deparsed macro, you get...
      MACRO (COMPILED & DEPARSED): while ($var3 < $cons4) { $body6; } continue { ++$var5 }
      while I get...
      MACRO (COMPILED & DEPARSED): $var1 = $cons2; while ($var3 < $cons4) { $body6; } continue { ++$var5 }
      ...I notice that the first line $var1 = $cons2; is missing in your report. And strangely enough there's a line is missing from your compilied and deparsed source code ...
      SOURCE (COMPILED & DEPARSED): my $x; while ($x < 8) { my $f = fact($x); print "factorial($x)=$f\n"; } continue { ++$x } sub fact { $_[0] <= 0 and return 1; return $_[0] * fact($_[0] - 1); } '???';
      vs. my machine
      SOURCE (COMPILED & DEPARSED): my $x; $x = 0; while ($x < 8) { my $f = fact($x); print "factorial($x)=$f\n"; } continue { ++$x } sub fact { $_[0] <= 0 and return 1; return $_[0] * fact($_[0] - 1); }
      ...the $x = 0; has gone missing. So this causes $reorder to differ...
      REORDER: $var3 = $1;$cons4 = $2;$body6 = $3;$var5 = $4;
      vs.
      REORDER: $var1 = $1;$cons2 = $2;$var3 = $3;$cons4 = $4;$body6 = $5;$va +r5 = $6;
      and is probably what causes the substitution match failure and thus you get the original code back. I don't know why lines would suddenly dissapear. Is it a buffering problem with the back-tick operator?

      update:I'd try just running...

      perl -MO=Deparse,-x9 -e'for($i=0;$i<10;$i++){$x++}'

      ...all by itself and see if it matches...

      $i = 0; while ($i < 10) { ++$x; } continue { ++$i }
      That's straight from 3rd edition camel p. 480. Anyone else have any ideas? BTW, I'm running perl 5.6.1 under Slackware Linux

        Hmm, I re-downloaded all your code, tried it again, no go. My guess is that B::Deparse on OS X is different that Linux. My version of Deparse is one behind the latest, so that might play a role in this.

        Cool stuff none the less, thanks for showing your output, so I could see the guts of what it was doing though.

        -stvn