comment on

BrowserUk:

One advantage of splitting the parser and lexer is that rather than having one humongous state machine that has to cover both grammar and lexing, you can split the task into two smaller machines. As you know, state machine complexity tends to grow exponentially, so two state machines half the size of a combined one can be *much* more tractable.

Another advantage of splitting is that you can use different techniques in different places. You might use a few simple regexes for your tokenizer and some recursive descent for your parser. If necessary, you can switch techniques in one or the other without rewriting *everything*.

The last advantage (that I'm writing about) of having the parser part is that you can more easily tie "meaning" to various locations in your grammar. For example, if you're doing some simple lexing, you might discover a symbol name. But what does the symbol *mean* once you lex it? Is it part of a declaration, a function name, a reference on the LHS, a use on the RHS?. Tying the meaning to particular spots in the syntax is a pretty nice feature.

If you're keeping track of enough information in your lexer to be able to know what's going on and whether the syntax is valid, then I'd argue that you haven't written a lexer, you've written a combination lexer/parser. After all, parsing is simply recognizing correct 'statements' of the grammar, so if you can more easily merge the tokenization with the rule checks, then I wouldn't worry about the 'fancy' methods. While there's nothing wrong with that approach, it might becom burdensome when the language is large/complex enough.

By the way. I just started playing with Marpa::R2 this afternoon, after reading this thread, so I know little about it so far. But having said that, I really recommend it over Parser::RecDescent. When I tried Parser::RecDescent, it took me forever to start getting results, and it was painful, too. The debugging capabilities really drove me mad. (Reading the trace in Parse::RecDescent is *painful*.)

But my puttering around with Marpa today was much more enjoyable. I got better results *much* more easily, and after a couple hours, I had a good start on a parser for a toy language. (I'll hopefully get to finish the parser tomorrow.) If I can get it into reasonable shape, I'll try to (remember to) post it.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

In reply to Re^7: Block-structured language parsing using a Perl module? by roboticus
in thread Block-structured language parsing using a Perl module? by BrowserUk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.