Re: Parsing using Parse::YYLex package

Use Parse::Eyapp or Parse:Yapp instead of byacc. They will provide an OOP interface. I will not recommend Parse::YYLex: It seems unstable.

The custom when dealing with line numbers is to make the attribute associated with the token somewhat more complex: an anonymous array or hash.

The following is an example of a Lexer for a C-like language written in Parse::Eyapp. The token attribute is an array having the value and the line number. Thus, for a number we return

('INUM',[$1, $tokenbegin])
[download]

Lines 9-23 show you the way to compute line numbers using the tr operator. Lines are skipped as "white spaces":

$ sed -ne '513,567p' Types.eyp | cat -n
 1  sub _Lexer {
 2    my($parser)=shift;
 3
 4    my $token;
 5    for ($parser->YYData->{INPUT}) { # warn! false for
 6        return('',undef) if !defined($_) or $_ eq '';
 7
 8        #Skip blanks
 9        s{\A
10           ((?:
11                \s+       # any white space char
12            |   /\*.*?\*/ # C like comments
13            )+
14           )
15         }
16         {}xs
17        and do { # Count line numbers
18              my($blanks)=$1;
19
20              #Maybe At EOF
21              return('', undef) if $_ eq '';
22              $tokenend += $blanks =~ tr/\n//;
23          };
24
25       $tokenbegin = $tokenend;
26
27        s/^('.')//
28                and return('CHARCONSTANT', [$1, $tokenbegin]);
29
30        s/^([0-9]+(?:\.[0-9]+)?)//
31                and return('INUM',[$1, $tokenbegin]);
32
33        s/^([A-Za-z][A-Za-z0-9_]*)//
34          and do {
35            my $word = $1;
36            my $r;
37            return ($r, [$r, $tokenbegin]) if defined($r = $reserved
+{$word});
38            return('ID',[$word, $tokenbegin]);
39        };
40
41        m/^(\S\S)/ and  defined($token = $1) and exists($lexeme{$tok
+en})
42          and do {
43            s/..//;
44            return ($token, [$token, $tokenbegin]);
45          }; # do
46
47        m/^(\S)/ and defined($token = $1) and  exists($lexeme{$token
+})
48          and do {
49            s/.//;
50            return ($token, [$token, $tokenbegin]);
51          }; # do
52
53        die "Unexpected character at $tokenbegin\n";
54    } # for
55  }
[download]

The hash %reserved contains the reserved words of the language. Lines 41-51 deal with lexemes as '=' and '**'. The hash %lexeme has s.t. like:


my %lexeme = (
  '='  => "ASSIGN",
  .................
  ']'  => "RIGHTBRACKET",
  '==' => "EQ",
  '+=' => "PLUSASSIGN",
  ...................
  '--' => "DEC",
  '**' => "EXP"
);
[download]

Hope it helps

Casiano

Comment on Re: Parsing using Parse::YYLex package Select or Download Code

Replies are listed 'Best First'.
Re^2: Parsing using Parse::YYLex package by Anonymous Monk on Jun 04, 2008 at 04:45 UTC
Casino, Thanks, but I have used the patch for Object oriented interface with my byacc package. So there is no problem with the object oriented interface. I found that I dont have much control over the perl lexer. I have a lexer handle with the list of tokens defined in @tokens. `$libLexer = Parse::YYLex->new(@tokens);` [download] I skipped the '\n' using the following. `Parse::YYLex->skip('([\\ \t\r\n]+)\|(\/\.\*\/)');` [download] But now I want to associate an action with '\n', but dont want to modify the rules for this in the existing grammar, which is working fine without this line number facility. Is it possible to ignore '\n' after performing an associated action?	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Parsing using Parse::YYLex package
by Anonymous Monk on Jun 04, 2008 at 04:45 UTC

$libLexer = Parse::YYLex->new(@tokens);
[download]

Parse::YYLex->skip('([\\ \t\r\n]+)|(\/\*.*\*\/)');
[download]

[reply]
[d/l]
[select]