comment on

Good Day Fellow Monks,

I am considering writing a parser and wanted some advice. Before I propose my main questions, I want to avoid a potential XY Problem and tell you what I need to accomplish.

For the last five years, I have developed and maintained a Perl object persistence layer. It is one of those APIs that let you define fields in a Perl object and it takes care of all the database stuff. I also developed a query language for it. The language is quite powerful with a minimum amount of syntax.

So, all this is working great for us. However, I want to extend the query language (QL) and it current implementation doesn't lend itself well to that. :-)

First, I started looking for existing Object Query Language standards and/or implementations that, perhaps, I could use instead of remaking my own. Well, this area of the industry is still pretty underdeveloped. I only found a few standards and they had rather limited functionality in contrast to our current implementation (i.e. JDO and J2EE's CMP]). So, my first question, does anyone know of any good language standards or (better) actual implementations that I should look at?

Back to enhancing our current QL: the current QL compiler basically runs regexes against a QL string, looking for field navigation instances (e.x. "department.managers.firstName) and replacing them with corresponding database fields and joining in related tables where necessary.

The regexes work fine, but the current code is not very maintainable. Either way, I am interested in doing a re-write of the compiler before I start adding the additional functionality.

So, I am looking hard at this and thinking; "Do I want to continue to use plain regexes, or completely parse the QL?" If I am going to parse, what API/technique should I use? My prioritized requirements are:

Light weight: For ease of installation and portability, I want to avoid native modules, at least at the moment.
Speed: The persistence API is used mostly in a CGI environment. During a request, we use 1 to 5 queries on average. The QL statements are generally very short, very rarely more than 300 characters long.
Ease of Development: Well, I am lazy. I don't want to work hard. If I have to build a compiler, I would like to have fun doing it. At lest this is my lowest priority of the three. :-)

This is when I bought Higher Order Perl for some inspiration. I like the lexing techniques proposed in the book. However, a lot of the stream oriented stuff is overkill here since I am dealing with small strings.

When it comes to parsing, the book implements a Left Hand Recursive Descent Parser. This book is a lot of fun BTW! When I saw that I thought of Parse::RecDescent, which would work wonders. But, I have heard many times that Parse::RecDescent is slow.

Has anyone used Parse::RecDescent in a CGI context to compile an average of 3 short strings per request? If so, how was performance?

I have started on an initial re-implantation. I lex up the strings nicely; there is no problem here. As I am parsing the tokens and converting them to SQL, I am finding myself writing a lot of recursion. So, this is why I ask if Parse::RecDescent is slow because of features, or because of the recursive implementation.

I also checked out Parse::Yapp. This is basically a Perl version of yacc. I thought that there may be a performance benefit by generating a parser as opposed to having the API re-interpret a grammar definition each time. However, the docs suggest that the closest Parse::Yapp gets to making a stand-alone parser is it copies the interpreter code into the generated module along with the grammar definition. So, on that ground it is equal to Parse::RecDescent. Anyone have any experience with the performance of either of these?

Normally benchmarking would be a simple answer to all of this. But, because writing a parser it a non-trivial task, it would be very difficult to implement different parsers in Parse::RecDescent, Parse::Yapp and RegEx to test them. I am hoping that our community can share with me their ideas so that I can make a more informed descision.

Thanks for reading this far!

update:dragonchild asked to see an example of the QL. Here are a few examples of some of the supported syntax. They should be pretty self-explanitory.

# All invoices above a MinValue in a City and State
select object()
from Project::Invoice
where lineitems.total.sum() > ?MinValue
and billto.address.where(city = ?City and state = ?State)

# All employees who were paid less than the value of the 
# projects for which they worked.
select object()
from Foo::Employee
where salary > parents(Foo::Project, employees).billings.lineitems.sum
+()

# All contacts whose first and second alternates are in
# State and whose fax starts with AreaCode
# or have referred N jobs about Value in dollars
select object()
from Bar::Contact
where alternates[0..1].address.state = ?State
and numbers{fax} like concat(?AreaCode, '%')
or refferals.jobs.where(count() >= ?N and billings.lineitems.total.sum
+() >= ?Value) is not empty
[download]

Ted Young

($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)

In reply to Object Query Languages and Parsers by TedYoung

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.