I don't think you realize what you've gotten into. Research into natural language processing (the official term) is at least 40 years old. Browse on citeulike for NLP, NLG (natural language generation), and NLU (...understanding). Also search Google Scholar.

The topic is extremely theoretical and heavily realizes on Chompsky's theories of grammars, as well as tree theory. There are two main approaches: McKeown's (a name you will see a lot) is usually based on nested templates (IIRC) and requires a massive database of grammatical structures. Other approaches rely on deep semantic representations of the knowledge that is implied by grammatical structures.

A couple of things that you have failed to consider, each of which has massive amounts of discussion in the literature:
- Referring expressions (him, her, that, etc.)
- Focus (what's the subject? Is the sentence really _about_ the subject?)
- homonymns (semantic homonymns, at least)
- fifty billion other things...

The dismal state of machine translation ought to indicate how far yet we have to go. Babelfish is rather state of the art, actually.

Good luck.

~e

In reply to Re^2: Theory time: Sentence equivalence by eweaver
in thread Theory time: Sentence equivalence by BUU

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.