in reply to X-Prize: Natural Language Processing
in thread X-prize software challenge?
I think that this is not a good software X-prize contender because:
platelayers paratrooper spumoni subversive bala womenfolk zealot wangling gym clout proxemic abravanel entryway assimilates faucets dialup's lamellate apparent propositioning olefin froude.
Input: "Mich interessiert das Thema, weil ich fachlich/ beruflich mit Betroffenen zu tun habe."
Output: "Engaged computing means computational processes that are engaged with the world—think embedded systems."
Both sentences are (probably) pretty well-formed in there respective languages. The cause of my indecison is that:
However, the two sentances (probably) have very little common meaning, as I picked them at random off the net.
The problem with every definition I've seen of "Natural Langauge Processing"; is that it assumes that it is possible to encode not only the syntactic and semantic information contained in a piece of meaningful, correct* text in such a way that all of that information can be embodied into some other langauge. It also suggests that all the meta information that the human brain devines from some auxillary clues, like context; previous awareness of the writer's style; attitudes and prejudices; and a whole lot more besides.
*How do we deal with almost correct input?
Even a single word can have many meanings which the human being in many cases can devine through context. Eg.
Fire.
In the absence of any meta-clues, there are at least 3 or 4 possible interpretations of that single word. Chances are, that english is the only langauge in which those 3 or 4 meanings use the same word.
Then there are phrases like: "Oh really". Without context, that can be a genuine enquiry, or pure sarcasm. In many cases, even native english speakers are hard pushed to discern the intended meaning even with the benefit of hearing the spoken inflection and being party to the context.
Indeed, whenever the use of language moves beyond the simplest of purely descriptive use, the meaning heard by the listener (or read by the reader) is as much a function of the listener/readers experiences, biases and knowledge as it is of the speaker's or writer's.
How often, even in this place with it's fairly constrained focus do half a dozen readers come away with different interpretations of a writer's words?
If you translate a document from one langauge to another, word-by-word, you usually end with garbage. If you translate phrase by phrase, you need a huge lookup table of all the billions of possible phrases and you might end up with something that reads more fluently, but there are two problems.
How do you encapsulate the variability between what the writer intended to write, and what the reader read? And more so, the differences in meaning percieved between two or more readers reading the same words? Or the same reader, reading the same words in two or more different contexts?
Using the "huge lookup table" method, the magnitude of the problem is not the hardware problem of storing and fast retreival of the translation databases. The problem is of constructing them in the first place.
The 'other' method of achieving the goal, that of translating all of the syntactic, semantic, contextual, environmental and every other "...al" meaning that is embodied within natural language into some machine encodable intermediate language. So that once so encoded, the translation to other langauges can be done by applying a set of language specific "construction rules", is even less feasible.
I think the problem with Natural Langauge Processing is that as yet, even the cleverest and most fluent speakers (in any language) have not found a way to use natural langauge to convey exact meanings even to others fluent in the same language.
Until humans can achieve this with a reasonable degree of accuracy and precision, writing a computer program to do it is a non-starter.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: X-Prize: Natural Language Processing
by dragonchild (Archbishop) on Oct 17, 2004 at 00:25 UTC | |
by BrowserUk (Patriarch) on Oct 17, 2004 at 02:03 UTC | |
by dragonchild (Archbishop) on Oct 17, 2004 at 02:54 UTC |