I've long looked at making a generic parser and command object model that could handle all of the command sentences in any Infocom game. I generalized further, and developed this basic sentence structure. I've implemented code to do this in Java, C and Perl over the years.
[ subject , ] verb [ adverb ] [ dobject | dobjectlist | "stringliteral" ] [ preposition iobject ] [ . | ? | ! ]
The verb is the only requirement.

Real subjects, dobjects and iobjects all follow the basic grammar:

[ article ] [ adjectives ] noun

There are quite a few alternative grammars to the main sentence type, but the overall fields are fixed and once determined, all have the same meaning. For instance, it's okay to type the adverb before the verb. The adverb usually describes a different tradeoff but the same basic verb behavior (run quickly) vs (run quietly). I would recommend against supporting multiple adverbs, especially adverbs modifying adverbs (like 'very').

The subject, if specified, must be first and followed by a comma. It's up to the subject to "consent" to the request; they can decide for themselves whether or not to allow the command (floyd, give me the circuit board).

The dobject is either singular, or a list of objects, or a string literal. If a list of objects, the word "and" and/or a comma must separate items. Special pseudo-articles such as qw(all some the my) can help a search strategy for multiple objects within a given search domain (put all goo in the box). Lastly, a string literal is used for things like dialogue (say "hello" to floyd). An alternative sentence grammar would assume that if the sentence consists only of a string literal, then the verb is either 'say', 'exclaim' or 'ask' depending on any final punctuation.

The overall effect of multiple dobjects is a simple iteration, with the sentence applied once identically to each dobject. Throw exceptions to interrupt the processing if desired.

Iobjects are always singular prepositional targets. An alternative sentence syntax allows iobject to precede dobject, but it really swaps them and supplies a default preposition (give floyd the broom) becomes (give the broom to floyd). This is detected while parsing by noting the missing comma/'and' between two noun phrases.

There's a lot more to my scheme; as I said I have developed the code but it's not something I can freely share in detail at this time. You're welcome to e-mail for other ideas, though.

--
[ e d @ h a l l e y . c c ]


In reply to Re: Parsing english by halley
in thread Parsing english by wolis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.