in reply to Creating parser for some syntax

Effectively, this style of configuring can be considered an RPC

You've lost me there already. Could you please explain what commonalities you see between RPC (which I assume means "Remote procedure call") and configuration by embedding a scripting language?

And since you mention danger later on, I see a huge difference here. In the RPC situation, both the caller and the callee have to guard against malicious incoming data. In the case of configuration via scripting languages, the attitude is often "if you screw up your program in the configuration, that's your problem", since the one who writes the configuration is usually also the user of the system, or a trusted party (like an administrator).

Now, let's return to the parser itself. We want it to be simple, so we just imagine, that there is chain of objects that may look at the input text and either succeed or fail at producing our expression object. So effectively, our parser should just maintain a chain of objects and during parsing simply activate all of them in turn until one of them produces desired result, after that the procedure repeats. The parsing objects should simply advance the input past the text they have recognized.

That sounds rather similar to how Perl 6 regexes are implemented. There each call to a subregex returns a "Cursor" object, which holds a reference to the original string, knows where the last regex call left off, and a few other pieces of data.

If a regex can match in several ways (ie if the engine can backtrack over that regex), a lazy list of cursors is returned instead of just a cursor. For parsing simple languages you can often get very far without ever using backtracking.

One thing that your scheme neglects is dealing with whitespace and comments. If you look at your examples again:

output "Hello there", friend_name
Written in BNF
<expression> ::= output <string-val> (',' <string-val>)*

You'll notice that there's no rule for parsing the whitespace in the input.

There are three possible solution: Writing a lexer that deal with the whitespace, taking care of whitespace in each rule (clutters the grammar), or having data-driven parsing rules where another piece of code adds code for handling whitespace.

To me, this approach to providing RPC appears much more simple and effective than for example SOAP.

You've lost me again. You've described a parser (in the context of RPC), and now you say it's simpler than SOAP. But a parser itself can never replace a complete RPC protocol (unless your previous use of RPC was complete overkill).

You need a network layer, serialization and deserialization (of which parsing is only a part), error handling, security layers (authentification and authorization, signing request) and so on.

So, please describe your use case a bit more, because in the general case I have no idea how parsing can replace SOAP (or any RPC really), and I don't see the connection between configuration and RPC either. But I do think you're onto something interesting.

Replies are listed 'Best First'.
Re^2: Creating parser for some syntax
by JavaFan (Canon) on Nov 15, 2011 at 07:33 UTC
    For parsing simple languages you can often get very far without ever using backtracking.
    Actually, that's also the case for non simple languages. Perl usually isn't qualified as a simple language, yet parsing it doesn't require backtracking. Many languages, including Perl, use a grammar that requires one token to look ahead -- although sometimes perl cheats and scans (but not tokenizes) the stream ahead.

    Limited lookahead parsing means that your grammar is written in such away that the compiler only needs to look at a fixed number of tokens before it can decide which grammar rule it has to apply to continue parsing.

    If I were to make a language, be it an embedded one or a standalone, I would want the compiling phase reasonable fast. I certainly don't want any backtracking, so a full blown regexp would be out. I may use a bunch of simple regexes in the tokenizer, but the main parsing loop should be table driven or be expressable in a state machine. But no backtracking.

Re^2: Creating parser for some syntax
by Anonymous Monk on Nov 15, 2011 at 08:08 UTC

    I found it very difficult to understand the OP, but I do believe, the OP is in the process of inventing "Higher-Order Perl"

    Ch2 especially explains a dispatch-table based config file parser

Re^2: Creating parser for some syntax
by andal (Hermit) on Nov 15, 2011 at 10:40 UTC
    You've lost me again. You've described a parser (in the context of RPC), and now you say it's simpler than SOAP. But a parser itself can never replace a complete RPC protocol (unless your previous use of RPC was complete overkill).

    Well, you are right, parser does not replace everything involved into RPC. Still, for the network layer we need only serialization/deserialization. The rest of stuff (authentication, encryption and so on) can be part of the RPC code itself. Nothing prevents you from allowing only authentication procedure for the first requests and then extending/replacing parser for the subsequent requests.

    Serialization can be very simple. For example something similar to Chunked Transfer of HTTP. First comes line with length, then specified number of bytes, finally comes the with zero to indicate the end of request/response.

    Please don't consider my post as an attempt to teach how to do GOOD compiler. It was an attempt to describe some way to create SIMPLE parser. This parser will be inefficient, since efficiency usually comes at the cost of complexity. But for one thing, working with SOAP is neither efficient, nor simple.

    SOAP is an attempt to make one thing suitable for all possible usages. I believe that it is more appropriate to create tools fitting specific needs and I'm searching for ways to make this approach simple.