comment on

Use Parse::Eyapp or Parse:Yapp instead of byacc. They will provide an OOP interface. I will not recommend Parse::YYLex: It seems unstable.

The custom when dealing with line numbers is to make the attribute associated with the token somewhat more complex: an anonymous array or hash.

The following is an example of a Lexer for a C-like language written in Parse::Eyapp. The token attribute is an array having the value and the line number. Thus, for a number we return

('INUM',[$1, $tokenbegin])
[download]

Lines 9-23 show you the way to compute line numbers using the tr operator. Lines are skipped as "white spaces":

$ sed -ne '513,567p' Types.eyp | cat -n
 1  sub _Lexer {
 2    my($parser)=shift;
 3
 4    my $token;
 5    for ($parser->YYData->{INPUT}) { # warn! false for
 6        return('',undef) if !defined($_) or $_ eq '';
 7
 8        #Skip blanks
 9        s{\A
10           ((?:
11                \s+       # any white space char
12            |   /\*.*?\*/ # C like comments
13            )+
14           )
15         }
16         {}xs
17        and do { # Count line numbers
18              my($blanks)=$1;
19
20              #Maybe At EOF
21              return('', undef) if $_ eq '';
22              $tokenend += $blanks =~ tr/\n//;
23          };
24
25       $tokenbegin = $tokenend;
26
27        s/^('.')//
28                and return('CHARCONSTANT', [$1, $tokenbegin]);
29
30        s/^([0-9]+(?:\.[0-9]+)?)//
31                and return('INUM',[$1, $tokenbegin]);
32
33        s/^([A-Za-z][A-Za-z0-9_]*)//
34          and do {
35            my $word = $1;
36            my $r;
37            return ($r, [$r, $tokenbegin]) if defined($r = $reserved
+{$word});
38            return('ID',[$word, $tokenbegin]);
39        };
40
41        m/^(\S\S)/ and  defined($token = $1) and exists($lexeme{$tok
+en})
42          and do {
43            s/..//;
44            return ($token, [$token, $tokenbegin]);
45          }; # do
46
47        m/^(\S)/ and defined($token = $1) and  exists($lexeme{$token
+})
48          and do {
49            s/.//;
50            return ($token, [$token, $tokenbegin]);
51          }; # do
52
53        die "Unexpected character at $tokenbegin\n";
54    } # for
55  }
[download]

The hash %reserved contains the reserved words of the language. Lines 41-51 deal with lexemes as '=' and '**'. The hash %lexeme has s.t. like:


my %lexeme = (
  '='  => "ASSIGN",
  .................
  ']'  => "RIGHTBRACKET",
  '==' => "EQ",
  '+=' => "PLUSASSIGN",
  ...................
  '--' => "DEC",
  '**' => "EXP"
);
[download]

Hope it helps

Casiano

In reply to Re: Parsing using Parse::YYLex package by casiano
in thread Parsing using Parse::YYLex package by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.