Basically i was playing around with the idea of implement an extremely simplistic language, and parsing it using perl. By simple i mean something like if/else structures, a while loop, and some vars. But the problem is, i have no idea how i could go about implementing this (and no, evaling code is probably not a good idea). I did some quick searches but i could find nothing useful. So basically im looking for advice, pointers, what modules i should use, examples, anything that would help me with this.

Replies are listed 'Best First'.
Re: Implementing a toy language in perl?
by jryan (Vicar) on Jun 07, 2002 at 05:03 UTC
    You might want to take a look at Parse::RecDescent, which was made to do heavy parsing like this. It isn't exactly a speed demon, however; so you might to look at other parsing methods as well. Japhy has a section in his upcoming regex book, Regular Expressions in Perl, about this. Ironically enough, its titled Parsing a Language. Take a look, its good stuff.
Re: Implementing a toy language in perl?
by Zaxo (Archbishop) on Jun 07, 2002 at 06:53 UTC

    First, you got to learn standardese. For that, don't read the spec for TURKEY_BOMB.

    There are lots of things you might find interesting at that site.

    After Compline,
    Zaxo

Re: Implementing a toy language in perl?
by stefp (Vicar) on Jun 07, 2002 at 20:29 UTC
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Implementing a toy language in perl?
by theorbtwo (Prior) on Jun 07, 2002 at 04:26 UTC
      To clarify, there are some scripts that compile simple languages down to parrot bytecode. Get yoruself over to dev.perl.org These scripts are what you want to look at. In particualar, there is a script to implement something called 'babyperl' - a simplistic version of perl.

      ____________________
      Jeremy
      I didn't believe in evil until I dated it.

        Couldn't have said it better myself. (Sorry. Sometimes, I try to be breif, and cut to much. Other times, I don't try to be breif, and say to much.)


        We are using here a powerful strategy of synthesis: wishful thinking. -- The Wizard Book

      Er, parrot is a toy language?
        Nope, but Parrot's pretty easy to implement toy languages on top of. Real languages too, of course.
Re: Implementing a toy language in perl?
by Anonymous Monk on Sep 18, 2016 at 21:55 UTC

    Here's an example for you. It implements a very simple language, see the examples/testcases in the DATA section.

    It uses a modified Pratt parser (sort of, maybe) to build a parse tree (Abstract Syntax Tree) in perl objects. It then prints the tree (in indent style).

    Finally it runs the tree (using perl's polymorphism) by sending the "v" method to the top node of the tree.

    It hasn't been extensively tested, so there may be a bug or two, but that's part of the fun, isn't it?

    I hope this complete running example helps...

    #!/usr/bin/perl # toylanguage.pl - a toy language for an example (by tybalt89) # perl'ified https://en.wikipedia.org/wiki/Pratt_parser # see __DATA__ section for examples of the language use strict; use warnings; my %memory; my ($running, $hasprint); # various flags my $ws = qr/(?:#.*|\s+)*+/; # white space my $aws = qr/(?:#.*|\s+)++/; # actual white space sub err { exit print "\n**ERROR** ", $running ? "@_" : s/\G/<** @_ **>/r, "\n" } sub getlist # for print and call { my @list = expr(5); push @list, expr(5) while /\G$ws , /gcx; return @list; } sub namelist # for formal parameter names { my @list; $1 eq 'as' ? return @list : push @list, $1 while /\G$ws ([a-z]\w*)\b /gcxi; err 'missing "as"'; } sub node { bless [ @_[1..$#_] ], $_[0] } sub closeparen { /\G$ws\)/gc ? shift : err 'missing )' } sub expr # the parser :) { /\G$aws/gc; my $tree = /\G - /gcx ? node NEG => expr(12) : # unary minus /\G \+ /gcx ? +expr(12) : # unary plus /\G \( /gcx ? closeparen expr(0) : /\G ((?:\d+(?:\.\d*)?|\.\d+)(?:[Ee][+-]?\d+\b)?) /gcx ? node NUM = +> $1 : /\G print\b /gcx ? ($hasprint++, node PRINT => getlist) : /\G for\b /gcx ? node FOR => /\G$ws ([a-z]\w*)\b /gcxi ? "$1" : err('missing name'), /\G$ws from\b /gcx ? expr(2) : err('missing "from"'), /\G$ws to\b /gcx ? expr(2) : err('missing "to"'), /\G$ws do\b /gcx ? expr(2) : err('missing "do"'), : /\G while\b /gcx ? node NWHILE => expr(3), /\G$ws do\b /gcx ? expr(2) : err 'missing "do"' : /\G ([a-z]\w*)\b /gcx ? do { my $name = $1; /\G$aws/gc; /\G =(?!=) /gcxi ? node STORE => $name, expr(5) : /\G with\b /gcxi ? $memory{$name} = node LAMBDA => node(PARAMS => namelist), expr(2) : /\G as\b /gcxi ? $memory{$name} = node LAMBDA => node('PARAMS'), expr(2) : /\G \( /gcxi ? closeparen node CALL => $name, node(ARGS => /\G$ws (?= \) ) /gcx ? () : getlist) : node FETCH => $name } : err 'bad operand'; # precedences: (12 **) (11 * /) (10 + -) (9 < > <= >=) (8 == !=) # (7 &&) (6 ||) (5 ?) (4 and) (3 or) (2 while) (1 ;) /\G$aws/gc, $tree = /\G \*\* /gcx ? node POW => $tree, expr(12) : $_[0] > 11 ? return $tree : /\G \* /gcx ? node MUL => $tree, expr(12) : /\G \/ /gcx ? node DIV => $tree, expr(12) : $_[0] > 10 ? return $tree : /\G \+ /gcx ? node ADD => $tree, expr(11) : /\G - /gcx ? node SUB => $tree, expr(11) : $_[0] > 9 ? return $tree : /\G >= /gcx ? node GE => $tree, expr(10) : /\G > /gcx ? node GT => $tree, expr(10) : /\G <= /gcx ? node LE => $tree, expr(10) : /\G < /gcx ? node LT => $tree, expr(10) : $_[0] > 8 ? return $tree : /\G == /gcx ? node EQ => $tree, expr(9) : /\G != /gcx ? node NE => $tree, expr(9) : $_[0] > 7 ? return $tree : /\G && /gcx ? node AND => $tree, expr(8) : $_[0] > 6 ? return $tree : /\G \|\| /gcx ? node OR => $tree, expr(7) : $_[0] > 5 ? return $tree : /\G \? /gcx ? node COND => $tree, expr(5), /\G$ws : /gcx ? expr(5) : err 'missing :' : $_[0] > 4 ? return $tree : /\G and\b /gcx ? node AND => $tree, expr(5) : $_[0] > 3 ? return $tree : /\G or\b /gcx ? node OR => $tree, expr(4) : $_[0] > 2 ? return $tree : /\G while\b /gcx ? node WHILE => $tree, expr(3) : $_[0] > 1 ? return $tree : /\G ; /gcx ? ( /\G$ws(?= ; | \) | \z )/gcx ? $tree : node STMT => ref $tree eq 'STMT' ? @$tree : $tree, expr(2) ) : return $tree while 1; } # control section for ( @ARGV ? @ARGV : grep /\S/, split /^__END__\n/m, join '', <DATA> +) { eval { $hasprint = $running = %memory = (); my $tree = expr(0); /\G$ws \z /gcx or err("incomplete parse"); print "\n", s/\s*\z/\n/r; showtree( $tree, 0 ); # uncomment to see AST $running++; my $answer = $tree->v; $hasprint or print "= $answer\n"; 1 } or err($@); } sub showtree { my ($t, $i) = @_; print '. ' x $i, ref $t eq 'NUM' ? $t->[0] : ref $t || $t, "\n"; showtree( $_, $i + 1 ) for ref $t && ref $t ne 'NUM' ? @$t : (); } # interpreter section sub ADD::v { $_[0][0]->v + $_[0][1]->v } sub SUB::v { $_[0][0]->v - $_[0][1]->v } sub MUL::v { $_[0][0]->v * $_[0][1]->v } sub DIV::v { $_[0][0]->v / $_[0][1]->v } sub POW::v { $_[0][0]->v ** $_[0][1]->v } sub AND::v { $_[0][0]->v and $_[0][1]->v } sub OR::v { $_[0][0]->v or $_[0][1]->v } sub NUM::v { $_[0][0] } sub NEG::v { -$_[0][0]->v } sub GT::v { $_[0][0]->v > $_[0][1]->v or 0 } sub GE::v { $_[0][0]->v >= $_[0][1]->v or 0 } sub LE::v { $_[0][0]->v <= $_[0][1]->v or 0 } sub LT::v { $_[0][0]->v < $_[0][1]->v or 0 } sub EQ::v { $_[0][0]->v == $_[0][1]->v or 0 } sub NE::v { $_[0][0]->v != $_[0][1]->v or 0 } sub COND::v { $_[0][0]->v ? $_[0][1]->v : $_[0][2]->v } sub FETCH::v { $memory{$_[0][0]} // err(" variable $_[0][0] never set" +) } sub STORE::v { $memory{$_[0][0]} = $_[0][1]->v } sub PRINT::v { my $t = 0; print "> @{[ map $t = $_->v, @{ $_[0] } ]}\n +"; $t } sub WHILE::v { $_[0][0]->v while $_[0][1]->v; 0 } sub NWHILE::v { $_[0][1]->v while $_[0][0]->v; 0 } sub STMT::v { my $t = 0; $t = $_->v for @{ $_[0] }; $t} sub FOR::v { my ($t, $s, $e, $n) = (0, $_[0][1]->v, $_[0][2]->v, $_[0] +[0]); local $memory{$n}; for my $i ($s..$e){$memory{$n} = $i; $t = $_[0][3]->v} $t } sub CALL::v { ref $memory{$_[0][0]} or err("$_[0][0] not a function"); $memory{$_[0][0]}->call( map $_->v, @{ $_[0][1] } ) } sub LAMBDA::call { my @params = @{ $_[0][0] }; @params and local @memory{@params} = @_[1..$#_]; $_[0][1]->v } sub LAMBDA::v { 0 } __DATA__ 123 * 456 __END__ 2 + 3 * 4 __END__ 2 * 3 + 4 __END__ -2**3 __END__ 2 > 5 ? 3 : 4 __END__ print 40 + 2 __END__ foo = 12 ; print foo, foo + foo * 3, -1 ; 0 __END__ factorial with n as n > 1 ? n * factorial(n - 1) : 1; print 5, factorial(5) __END__ # classic factorial factorial with n as n > 1 ? n * factorial(n - 1) : 1; for n from 0 to 10 do print n, factorial(n); __END__ # towers tower(3, 1, 2, 3); tower with n from to spare as n > 0 and ( tower( n - 1, from, spare, to ); print n, from, to; tower( n - 1, spare, to, from ); ) __END__ foo = 5; foo = -1 + print foo while foo > 0; __END__ qwe = 7; while qwe > 3 do ( print qwe; qwe = qwe - 1; 41; 42 );;; __END__

      Oops, there is a /i in the wrong place(s).

      #!/usr/bin/perl # toylanguage.pl - a toy language for an example (by tybalt89) # perl'ified https://en.wikipedia.org/wiki/Pratt_parser # see __DATA__ section for examples of the language use strict; use warnings; my %memory; my ($running, $hasprint); # various flags my $ws = qr/(?:#.*|\s+)*+/; # white space my $aws = qr/(?:#.*|\s+)++/; # actual white space sub err { exit print "\n**ERROR** ", $running ? "@_" : s/\G/<** @_ **>/r, "\n" } sub getlist # for print and call { my @list = expr(5); push @list, expr(5) while /\G$ws , /gcx; return @list; } sub namelist # for formal parameter names { my @list; $1 eq 'as' ? return @list : push @list, $1 while /\G$ws ([a-z]\w*)\b /gcxi; err 'missing "as"'; } sub node { bless [ @_[1..$#_] ], $_[0] } sub closeparen { /\G$ws\)/gc ? shift : err 'missing )' } sub expr # the parser :) { /\G$aws/gc; my $tree = /\G - /gcx ? node NEG => expr(12) : # unary minus /\G \+ /gcx ? +expr(12) : # unary plus /\G \( /gcx ? closeparen expr(0) : /\G ((?:\d+(?:\.\d*)?|\.\d+)(?:[Ee][+-]?\d+\b)?) /gcx ? node NUM = +> $1 : /\G print\b /gcx ? ($hasprint++, node PRINT => getlist) : /\G sqrt\b $ws \( /gcx ? closeparen node SQRT => expr(5) : /\G for\b /gcx ? node FOR => /\G$ws ([a-z]\w*)\b /gcxi ? "$1" : err('missing name'), /\G$ws from\b /gcx ? expr(2) : node(NUM => 1), /\G$ws to\b /gcx ? expr(2) : err('missing "to"'), /\G$ws by\b /gcx ? expr(2) : node(NUM => 1), /\G$ws do\b /gcx ? expr(2) : err('missing "do"'), : /\G while\b /gcx ? node NWHILE => expr(3), /\G$ws do\b /gcx ? expr(2) : err 'missing "do"' : /\G ([a-z]\w*)\b /gcxi ? do { my $name = $1; /\G$aws/gc; /\G =(?!=) /gcx ? node STORE => $name, expr(5) : /\G with\b /gcx ? $memory{$name} = node LAMBDA => $name, node(PARAMS => namelist), expr(2) : /\G as\b /gcx ? $memory{$name} = node LAMBDA => $name, node('PARAMS'), expr(2) : /\G \( /gcx ? closeparen node CALL => $name, node(ARGS => /\G$ws (?= \) ) /gcx ? () : getlist) : node FETCH => $name } : err 'bad operand'; # precedences: (12 **) (11 * /) (10 + -) (9 < > <= >=) (8 == !=) # (7 &&) (6 ||) (5 ?) (4 and) (3 or) (2 while) (1 ;) /\G$aws/gc, $tree = /\G \*\* /gcx ? node POW => $tree, expr(12) : $_[0] > 11 ? return $tree : /\G \* /gcx ? node MUL => $tree, expr(12) : /\G \/ /gcx ? node DIV => $tree, expr(12) : $_[0] > 10 ? return $tree : /\G \+ /gcx ? node ADD => $tree, expr(11) : /\G - /gcx ? node SUB => $tree, expr(11) : $_[0] > 9 ? return $tree : /\G >= /gcx ? node GE => $tree, expr(10) : /\G > /gcx ? node GT => $tree, expr(10) : /\G <= /gcx ? node LE => $tree, expr(10) : /\G < /gcx ? node LT => $tree, expr(10) : $_[0] > 8 ? return $tree : /\G == /gcx ? node EQ => $tree, expr(9) : /\G != /gcx ? node NE => $tree, expr(9) : $_[0] > 7 ? return $tree : /\G && /gcx ? node AND => $tree, expr(8) : $_[0] > 6 ? return $tree : /\G \|\| /gcx ? node OR => $tree, expr(7) : $_[0] > 5 ? return $tree : /\G \? /gcx ? node COND => $tree, expr(5), /\G$ws : /gcx ? expr(5) : err 'missing :' : $_[0] > 4 ? return $tree : /\G and\b /gcx ? node AND => $tree, expr(5) : $_[0] > 3 ? return $tree : /\G or\b /gcx ? node OR => $tree, expr(4) : $_[0] > 2 ? return $tree : /\G while\b /gcx ? node WHILE => $tree, expr(3) : $_[0] > 1 ? return $tree : /\G ; (?= $ws (?: ; | \) | \z ) ) /gcx ? $tree : /\G ; /gcx ? node STMT => ref $tree eq 'STMT' ? @$tree : $tree, ex +pr(2) : return $tree while 1; } # control section for ( @ARGV ? @ARGV : grep /\S/, split /^__END__\n/m, join '', <DATA> +) { eval { $hasprint = $running = %memory = (); my $tree = expr(0); /\G$ws \z /gcx or err("incomplete parse"); print "\n", s/\s*\z/\n/r; showtree( $tree, 0 ); # uncomment to see AST $running++; my $answer = $tree->v; $hasprint or print "= $answer\n"; 1 } or err($@); } sub showtree { my ($t, $i) = @_; print '. ' x $i, ref $t eq 'NUM' ? $t->[0] : ref $t || $t, "\n"; showtree( $_, $i + 1 ) for ref $t && ref $t ne 'NUM' ? @$t : (); } # interpreter section sub ADD::v { $_[0][0]->v + $_[0][1]->v } sub SUB::v { $_[0][0]->v - $_[0][1]->v } sub MUL::v { $_[0][0]->v * $_[0][1]->v } sub DIV::v { $_[0][0]->v / $_[0][1]->v } sub POW::v { $_[0][0]->v ** $_[0][1]->v } sub AND::v { $_[0][0]->v and $_[0][1]->v } sub OR::v { $_[0][0]->v or $_[0][1]->v } sub NUM::v { $_[0][0] } sub NEG::v { -$_[0][0]->v } sub GT::v { $_[0][0]->v > $_[0][1]->v or 0 } sub GE::v { $_[0][0]->v >= $_[0][1]->v or 0 } sub LE::v { $_[0][0]->v <= $_[0][1]->v or 0 } sub LT::v { $_[0][0]->v < $_[0][1]->v or 0 } sub EQ::v { $_[0][0]->v == $_[0][1]->v or 0 } sub NE::v { $_[0][0]->v != $_[0][1]->v or 0 } sub COND::v { $_[0][0]->v ? $_[0][1]->v : $_[0][2]->v } sub FETCH::v { $memory{$_[0][0]} // err(" variable $_[0][0] never set" +) } sub STORE::v { $memory{$_[0][0]} = $_[0][1]->v } sub PRINT::v { my $t = 0; print "> @{[ map $t = $_->v, @{ $_[0] } ]}\n +"; $t } sub SQRT::v { sqrt $_[0][0]->v } sub WHILE::v { $_[0][0]->v while $_[0][1]->v; 0 } sub NWHILE::v { $_[0][1]->v while $_[0][0]->v; 0 } sub STMT::v { my $t = 0; $t = $_->v for @{ $_[0] }; $t} sub FOR::v { my ($t, $s, $e, $b, $n) = (0, $_[0][1]->v, $_[0][2]->v, $_[0][3]->v, $_[0][0]); local $memory{$n}; for(my $i = $s; $s <= $e; $i = $s += $b) {$memory{$n} = $i; $t = $_[0][4]->v} $t } sub CALL::v { ref $memory{$_[0][0]} or err("$_[0][0] not a function"); $memory{$_[0][0]}->call( map $_->v, @{ $_[0][1] } ) } sub LAMBDA::call { my @params = @{ $_[0][1] }; @params and local @memory{@params} = @_[1..$#_]; $_[0][2]->v } sub LAMBDA::v { 0 } __DATA__ 123 * 456 __END__ 2 + 3 * 4 __END__ 2 * 3 + 4 __END__ -2**3 __END__ 2 > 5 ? 3 : 4 __END__ print 40 + 2 __END__ foo = 12 ; print foo, foo + foo * 3, -1 ; 0 __END__ factorial with n as n > 1 ? n * factorial(n - 1) : 1; print 5, factorial(5) __END__ # classic factorial factorial with n as n > 1 ? n * factorial(n - 1) : 1; for n from 0 to 10 do print n, factorial(n); __END__ # towers tower(3, 1, 2, 3); tower with n from to spare as n > 0 and ( tower( n - 1, from, spare, to ); print n, from, to; tower( n - 1, spare, to, from ); ) __END__ foo = 5; foo = -1 + print foo while foo > 0; __END__ qwe = 7; while qwe > 3 do ( print qwe; qwe = qwe - 1; 41; 42 );;; __END__