beppu has asked for the wisdom of the Perl Monks concerning the following question:
I've been using Parse::RecDescent for the past few months, and I've been hitting some of its limits. It becomes way too slow when parsing large (64KB+) scalars. Don't even think about handing it a 1MB scalar. You will wait for hours for it to finish. That being said, I still think Damian Conway is the man, but I need to find a faster solution.
I've decided on Parse::Yapp which seems to be modeled after yacc. There is nothing like lex that goes with it, though -- you're supposed to provide your own lexer. Unfortunately for me, I don't have much in the way of formal knowledge when it comes to parsing. Using Parse::RecDescent was my first experience with actually writing a grammar and systematically parsing text. ...and Parse::RecDescent kinda meshes the lexing and parsing parts together.
I don't know where to draw the line between lexer and parser. My understanding is that a lexer is supposed to go through data and make tokens out of it to feed to the parser. The parser has higher level knowledge about these tokens in the form of a grammar and tries to make sense of this stream of tokens. That seems simple enough at first, but when I try to think about how I would write a lexer, it seems as if the lexer needs to know a little about the grammar.
Let me give a contrived example.
my $x = "my $x = \"my $x\"";
In this case, qmy $x occurs three times in two different contexts. Once as a perlop and variable, and the other two times as a quoted string. If any of you out there have any experience with writing lexers, help me out here.
What would be the stream of tokens returned by the above line of code assuming Perl semantics? I have never written a lexer so it'd be nice if I could get an example with some rationale. What I'm trying to find is what the right balance of responsibilities is between a lexer and a parser. I'd be especially interested in what goes on inside the quoted string. Is it all one big token, or is it split into a bunch of little tokens for the parser to assemble together into something coherent.
If you've read this far, thanks for your patience.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: The Relation Between Lexers and Parsers
by rlk (Pilgrim) on Jan 20, 2001 at 03:25 UTC | |
by beppu (Hermit) on Jan 20, 2001 at 03:33 UTC | |
by mirod (Canon) on Jan 20, 2001 at 13:29 UTC | |
|
Re: The Relation Between Lexers and Parsers
by cephas (Pilgrim) on Jan 20, 2001 at 10:36 UTC | |
|
Re: The Relation Between Lexers and Parsers
by MeowChow (Vicar) on Jan 20, 2001 at 10:03 UTC | |
|
Re: The Relation Between Lexers and Parsers
by cephas (Pilgrim) on Jan 20, 2001 at 03:46 UTC | |
|
Re: The Relation Between Lexers and Parsers
by t'mo (Pilgrim) on Jan 20, 2001 at 07:55 UTC | |
|
Re: The Relation Between Lexers and Parsers
by beppu (Hermit) on Jan 21, 2001 at 21:17 UTC |