in reply to Parsing using Parse::YYLex package

Use Parse::Eyapp or Parse:Yapp instead of byacc. They will provide an OOP interface. I will not recommend Parse::YYLex: It seems unstable.

The custom when dealing with line numbers is to make the attribute associated with the token somewhat more complex: an anonymous array or hash.

The following is an example of a Lexer for a C-like language written in Parse::Eyapp. The token attribute is an array having the value and the line number. Thus, for a number we return

('INUM',[$1, $tokenbegin])
Lines 9-23 show you the way to compute line numbers using the tr operator. Lines are skipped as "white spaces":
$ sed -ne '513,567p' Types.eyp | cat -n 1 sub _Lexer { 2 my($parser)=shift; 3 4 my $token; 5 for ($parser->YYData->{INPUT}) { # warn! false for 6 return('',undef) if !defined($_) or $_ eq ''; 7 8 #Skip blanks 9 s{\A 10 ((?: 11 \s+ # any white space char 12 | /\*.*?\*/ # C like comments 13 )+ 14 ) 15 } 16 {}xs 17 and do { # Count line numbers 18 my($blanks)=$1; 19 20 #Maybe At EOF 21 return('', undef) if $_ eq ''; 22 $tokenend += $blanks =~ tr/\n//; 23 }; 24 25 $tokenbegin = $tokenend; 26 27 s/^('.')// 28 and return('CHARCONSTANT', [$1, $tokenbegin]); 29 30 s/^([0-9]+(?:\.[0-9]+)?)// 31 and return('INUM',[$1, $tokenbegin]); 32 33 s/^([A-Za-z][A-Za-z0-9_]*)// 34 and do { 35 my $word = $1; 36 my $r; 37 return ($r, [$r, $tokenbegin]) if defined($r = $reserved +{$word}); 38 return('ID',[$word, $tokenbegin]); 39 }; 40 41 m/^(\S\S)/ and defined($token = $1) and exists($lexeme{$tok +en}) 42 and do { 43 s/..//; 44 return ($token, [$token, $tokenbegin]); 45 }; # do 46 47 m/^(\S)/ and defined($token = $1) and exists($lexeme{$token +}) 48 and do { 49 s/.//; 50 return ($token, [$token, $tokenbegin]); 51 }; # do 52 53 die "Unexpected character at $tokenbegin\n"; 54 } # for 55 }
The hash %reserved contains the reserved words of the language. Lines 41-51 deal with lexemes as '=' and '**'. The hash %lexeme has s.t. like:
my %lexeme = ( '=' => "ASSIGN", ................. ']' => "RIGHTBRACKET", '==' => "EQ", '+=' => "PLUSASSIGN", ................... '--' => "DEC", '**' => "EXP" );

Hope it helps

Casiano

Replies are listed 'Best First'.
Re^2: Parsing using Parse::YYLex package
by Anonymous Monk on Jun 04, 2008 at 04:45 UTC
    Casino,

    Thanks, but I have used the patch for Object oriented interface with my byacc package. So there is no problem with the object oriented interface.

    I found that I dont have much control over the perl lexer.
    I have a lexer handle with the list of tokens defined in @tokens.
    $libLexer = Parse::YYLex->new(@tokens);

    I skipped the '\n' using the following.
    Parse::YYLex->skip('([\\ \t\r\n]+)|(\/\*.*\*\/)');


    But now I want to associate an action with '\n', but dont want to modify the rules for this in the existing grammar, which is working fine without this line number facility.
    Is it possible to ignore '\n' after performing an associated action?