Use Parse::Eyapp or Parse:Yapp instead of
byacc. They will provide an OOP interface.
I will not recommend Parse::YYLex: It seems unstable.
The custom when dealing with line
numbers is to make the attribute associated
with the token
somewhat more complex: an anonymous array or hash.
The following is an example of a Lexer for a C-like language
written in Parse::Eyapp. The token attribute
is an array having the value and the line number. Thus, for a number we return
('INUM',[$1, $tokenbegin])
Lines 9-23 show you the way to compute line numbers using
the tr operator. Lines are skipped as "white spaces":
$ sed -ne '513,567p' Types.eyp | cat -n
1 sub _Lexer {
2 my($parser)=shift;
3
4 my $token;
5 for ($parser->YYData->{INPUT}) { # warn! false for
6 return('',undef) if !defined($_) or $_ eq '';
7
8 #Skip blanks
9 s{\A
10 ((?:
11 \s+ # any white space char
12 | /\*.*?\*/ # C like comments
13 )+
14 )
15 }
16 {}xs
17 and do { # Count line numbers
18 my($blanks)=$1;
19
20 #Maybe At EOF
21 return('', undef) if $_ eq '';
22 $tokenend += $blanks =~ tr/\n//;
23 };
24
25 $tokenbegin = $tokenend;
26
27 s/^('.')//
28 and return('CHARCONSTANT', [$1, $tokenbegin]);
29
30 s/^([0-9]+(?:\.[0-9]+)?)//
31 and return('INUM',[$1, $tokenbegin]);
32
33 s/^([A-Za-z][A-Za-z0-9_]*)//
34 and do {
35 my $word = $1;
36 my $r;
37 return ($r, [$r, $tokenbegin]) if defined($r = $reserved
+{$word});
38 return('ID',[$word, $tokenbegin]);
39 };
40
41 m/^(\S\S)/ and defined($token = $1) and exists($lexeme{$tok
+en})
42 and do {
43 s/..//;
44 return ($token, [$token, $tokenbegin]);
45 }; # do
46
47 m/^(\S)/ and defined($token = $1) and exists($lexeme{$token
+})
48 and do {
49 s/.//;
50 return ($token, [$token, $tokenbegin]);
51 }; # do
52
53 die "Unexpected character at $tokenbegin\n";
54 } # for
55 }
The hash %reserved contains the reserved words of the language. Lines 41-51 deal with lexemes as
'=' and '**'. The hash
%lexeme has s.t. like:
my %lexeme = (
'=' => "ASSIGN",
.................
']' => "RIGHTBRACKET",
'==' => "EQ",
'+=' => "PLUSASSIGN",
...................
'--' => "DEC",
'**' => "EXP"
);
Hope it helps
Casiano |