comment on

For the last few months, I've been spending part of my time on an elaborate parsing project. (See To model or not to model and To make the Model WORK.) My first pass was a regex-based approach, and it worked fairly well, but it was difficult to expand and had problems discovering what wasn't there, i.e., variables that hadn't been defined.

So, I sucked it in and rewrote it with Parse::RecDescent. My first pass used five-element productions to allow me to choose operator precedence (there are seven levels in the grammar), but a cow-orker with a noggin reminded me of Bjarne Stroustrup's elegant Calculator parser. The first one worked, but it took about ten hours for 1800 variable definition equations. Defining a grammar that implemented Stroustrup's cascade got me the results in eight hours, but watching the megabytes of trace stream by showed me that it wasn't storing the intermediate results, it was recalculating everything hundreds of times. It is certainly possible that my understanding of P::RD is the culprit, but I had several cow-orkers give me a read, and they couldn't see anything amiss.

Quandary time: what to do? If I continued to use P::RD, I would rewrite it so that all the work was done in the next-token { action blocks }. This seems to be defeating the purpose of P::RD.

I chose to rewrite everything from the ground up using a raw-language implementation of Stroustrup's algorithm expanded to include the precedence levels, user-defined functions, and nested if-then-else's of the language I'm parsing. It took me a little over two weeks to do the rewrite. The bottom line is that my solution time went from over eight hours to 36 seconds, an improvement of almost three orders of magnitude.

There are times when modules are a gift, and there are times when they get in the way. My raw-language implementation gives me access to everything, and it's not really harder to understand than the P::RD grammar, which was loaded with { action } cases for exceptions like divide-by-zero and nested elements. Each of the precedence levels is handled by a subroutine that's less than twenty pretty-printed lines long, and the whole program took me only about as long to write and debug as I spent on the P::RD grammar in its several passes.

So, before you load up your code with a lot of use XYZ::Foo::Bar statements, think about it. Your language might just be your best friend!

In reply to On modules by samizdat

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.