The goal of a NLP program fluent in any two given languages is to provide the same capabilities that a person fluent in two languages would provide between those two languages. If a person fluent in English and some other language were to be asked "Translate the sentence Fire.", that person would ask for context. A properly-written NLP program would do the same thing. The same goes for your list of words grabbed at random from /usr/dict.

The 'other' method of achieving the goal, that of translating all of the syntactic, semantic, contextual, environmental and every other "...al" meaning that is embodied within natural language into some machine encodable intermediate language. So that once so encoded, the translation to other langauges can be done by applying a set of language specific "construction rules", is even less feasible.

Why is that so? I think that the problem is a problem of algorith and data structure. Most attempts I've seen (and my cousin wrote her masters on the very topic ... in French) attempts to follow standard sentence deconstruction, the kind you learned in English class. I think that this method fails to understand the purpose of language.

Language, to my mind, is meant to convey concepts. Very fuzzy, un-boxable concepts. But, the only intermediate language we have is, well, language. So, we encode in a very lossy algorithm to words, phrases, sentences, and paragraphs. Then, the listener decodes in a similarly lossy algorithm (which isn't the same algorithm anyone else would use to decode the same text) into their framework of concepts. Usually, the paradigms are close enough or the communication is generic enough that transmission of concepts is possible. However, there are many instances, and I'm sure each of us has run into one, where the concepts we were trying to communicate did not get through. And, this is a problem, as you noted, not just a problem between speakers of different languages, but also between fluent speakers of the same language.

I would like to note that such projects of constructing an intermediate language have successfully occurred in the past. The most notable example of this is the Chinese writing system. There are at least 5 major languages that use the exact same writing system. Thus, someone who speaks only Mandarin can communicate just fine with someone who speaks only Cantonese, solely by written communication. There are other examples, but none as wide-reaching. So, it is a feasible idea. And, I think, it's a critical idea. If we can come up with an intermediate language representing the actual concepts being communicated, that would revolutionize philosophy, linguistics, computer science, and a host of other fields. It's not a matter of whether this project is worthwhile. I think it's a matter of we cannot afford to not do it.

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.


In reply to Re^2: X-Prize: Natural Language Processing by dragonchild
in thread X-prize software challenge? by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.