Re: The Relation Between Lexers and Parsers

Here's a quick non OO lexer that is very similar to the lexer Damian conway creates in his book Objected Oriented Perl (in the book, he blesses a subroutine that is almost exactly like this, and then adds some utility methods to go along with it.) This will break my $x = "my $x = \"my $x\""; into its components (keyword, variable, quoted string, and semicolon)
Note: all you have to do is define your tokens in terms of regular expressions.


#!/usr/bin/perl -w

use strict;

my $tokens = {
                'KEYWORD' => 'my',
                'QSTRING' => '"(?:[^"]|\\")*"',
                'VAR'   => '\$\w+',
                'SEMI' => ';',
         };



open(FH,"$ARGV[0]") || die("Error opening $ARGV[0]: $!\n");
undef $/;
my $text = <FH>;
close(FH);

my $lexer = &create_lexer($tokens);
{
 my @token = &$lexer($text);
 last unless(defined($token[0]));
 print("Got: $token[0] => $token[1]\n") if(defined($$tokens{$token[0]}
+));
 redo;
}

sub create_lexer {
 my($tokens) = shift;
 my($sub,$code,$token,$key);
 foreach $key (keys(%$tokens)) {
  $code .= '$_[0] =~ s/\A\s*?(' . $$tokens{$key} . ')// ';
  $code .= "&& return('$key'," . '$1);' . "\n";
 }
 $code .= '$_[0] =~ s/\A\s*?(\S)// && return("",$1);';
 $code .= "return(undef,undef);\n";
 $sub = eval("sub { $code } ") || die($@);
 return($sub);
}
[download]

cephas

BTW: Did I mention that Damian Conway wrote a great book <bold>Object Oriented Perl</bold> (And just in case there was any doubt, I got my idea for this lexer from him, and not the other way around. :) )

Comment on Re: The Relation Between Lexers and Parsers Download Code