Use Parse::Eyapp or Parse:Yapp instead of byacc. They will provide an OOP interface. I will not recommend Parse::YYLex: It seems unstable.

The custom when dealing with line numbers is to make the attribute associated with the token somewhat more complex: an anonymous array or hash.

The following is an example of a Lexer for a C-like language written in Parse::Eyapp. The token attribute is an array having the value and the line number. Thus, for a number we return

('INUM',[$1, $tokenbegin])
Lines 9-23 show you the way to compute line numbers using the tr operator. Lines are skipped as "white spaces":
$ sed -ne '513,567p' Types.eyp | cat -n 1 sub _Lexer { 2 my($parser)=shift; 3 4 my $token; 5 for ($parser->YYData->{INPUT}) { # warn! false for 6 return('',undef) if !defined($_) or $_ eq ''; 7 8 #Skip blanks 9 s{\A 10 ((?: 11 \s+ # any white space char 12 | /\*.*?\*/ # C like comments 13 )+ 14 ) 15 } 16 {}xs 17 and do { # Count line numbers 18 my($blanks)=$1; 19 20 #Maybe At EOF 21 return('', undef) if $_ eq ''; 22 $tokenend += $blanks =~ tr/\n//; 23 }; 24 25 $tokenbegin = $tokenend; 26 27 s/^('.')// 28 and return('CHARCONSTANT', [$1, $tokenbegin]); 29 30 s/^([0-9]+(?:\.[0-9]+)?)// 31 and return('INUM',[$1, $tokenbegin]); 32 33 s/^([A-Za-z][A-Za-z0-9_]*)// 34 and do { 35 my $word = $1; 36 my $r; 37 return ($r, [$r, $tokenbegin]) if defined($r = $reserved +{$word}); 38 return('ID',[$word, $tokenbegin]); 39 }; 40 41 m/^(\S\S)/ and defined($token = $1) and exists($lexeme{$tok +en}) 42 and do { 43 s/..//; 44 return ($token, [$token, $tokenbegin]); 45 }; # do 46 47 m/^(\S)/ and defined($token = $1) and exists($lexeme{$token +}) 48 and do { 49 s/.//; 50 return ($token, [$token, $tokenbegin]); 51 }; # do 52 53 die "Unexpected character at $tokenbegin\n"; 54 } # for 55 }
The hash %reserved contains the reserved words of the language. Lines 41-51 deal with lexemes as '=' and '**'. The hash %lexeme has s.t. like:
my %lexeme = ( '=' => "ASSIGN", ................. ']' => "RIGHTBRACKET", '==' => "EQ", '+=' => "PLUSASSIGN", ................... '--' => "DEC", '**' => "EXP" );

Hope it helps

Casiano


In reply to Re: Parsing using Parse::YYLex package by casiano
in thread Parsing using Parse::YYLex package by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.