The simplest parser begins with the simplest language design. A language that has a one token (or even two) look-ahead will always be easier to parse than a language that requires consuming long runs of tokens before you know what the run means. For example, suppose we had a language that looked like this:
action article target "\n" action ::= "WALK" | "FEED" | "PLAY WITH" target ::= "DOG" | "CAT" | "CANARY" | "FISH" article ::= "A" | "THE"

Language samples would look like this:

WALK THE DOG FEED THE CANARY PLAY WITH A FISH

and a parser (sans error detection) could be as simple as this:

use strict; use warnings; while (my $sLine = <DATA>) { my @aFields = split(/\s+/, $sLine); #the general case my $sAction = shift @aFields; my $sArticle = shift @aFields; my $sTarget = shift @aFields; #special cases - PLAY WITH has two tokens if ($sAction eq 'PLAY') { $sAction .= " $sArticle"; #$sArticle is 'WITH' $sArticle = $sTarget; #$sTarget is an article $sTarget = shift @aFields; #target never got read } doSomethingWithStatement($sAction, $sArticle, $sTarget); } sub doSomethingWithStatement { my ($sAction, $sArticle, $sTarget) = @_; print "action=<$sAction> article=<$sArticle> target=<$sTarget>\n"; } __DATA__ WALK THE DOG FEED THE FISH PLAY WITH A CANARY

Of course, for production purposes you would probably want to do some error checking. Also you might want to make certain combinations of verbs and targets illegal. For example, it doesn't make much sense to "WALK A FISH". And if you really want to get fancy you can explore the wonderful world of lexers. But hopefully this will illustrate the basic idea.

I wanted to provide you with a link to an easy to understand non-jargon filled article on designing an easy to parse language, but I'm having little luck googling for that. Perhaps another Monk with better search term fu can help you with that.

Best, beth


In reply to Re: A minilanguage with the least effort? by ELISHEVA
in thread A minilanguage with the least effort? by esk555

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.