in reply to The Relation Between Lexers and Parsers

Here's a quick non OO lexer that is very similar to the lexer Damian conway creates in his book Objected Oriented Perl (in the book, he blesses a subroutine that is almost exactly like this, and then adds some utility methods to go along with it.) This will break my $x = "my $x = \"my $x\""; into its components (keyword, variable, quoted string, and semicolon)
Note: all you have to do is define your tokens in terms of regular expressions.

#!/usr/bin/perl -w use strict; my $tokens = { 'KEYWORD' => 'my', 'QSTRING' => '"(?:[^"]|\\")*"', 'VAR' => '\$\w+', 'SEMI' => ';', }; open(FH,"$ARGV[0]") || die("Error opening $ARGV[0]: $!\n"); undef $/; my $text = <FH>; close(FH); my $lexer = &create_lexer($tokens); { my @token = &$lexer($text); last unless(defined($token[0])); print("Got: $token[0] => $token[1]\n") if(defined($$tokens{$token[0]} +)); redo; } sub create_lexer { my($tokens) = shift; my($sub,$code,$token,$key); foreach $key (keys(%$tokens)) { $code .= '$_[0] =~ s/\A\s*?(' . $$tokens{$key} . ')// '; $code .= "&& return('$key'," . '$1);' . "\n"; } $code .= '$_[0] =~ s/\A\s*?(\S)// && return("",$1);'; $code .= "return(undef,undef);\n"; $sub = eval("sub { $code } ") || die($@); return($sub); }


cephas

BTW: Did I mention that Damian Conway wrote a great book <bold>Object Oriented Perl</bold> (And just in case there was any doubt, I got my idea for this lexer from him, and not the other way around. :) )