in reply to Re: X-prize Suggestions here please!
in thread X-prize software challenge?
Criteria
The criteria here are going to be a little vague, but hopefully we can expand on it.
Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: X-Prize: Natural Language Processing
by BrowserUk (Patriarch) on Oct 16, 2004 at 16:33 UTC | |
I think that this is not a good software X-prize contender because: The problem with every definition I've seen of "Natural Langauge Processing"; is that it assumes that it is possible to encode not only the syntactic and semantic information contained in a piece of meaningful, correct* text in such a way that all of that information can be embodied into some other langauge. It also suggests that all the meta information that the human brain devines from some auxillary clues, like context; previous awareness of the writer's style; attitudes and prejudices; and a whole lot more besides. *How do we deal with almost correct input? Even a single word can have many meanings which the human being in many cases can devine through context. Eg. Fire. In the absence of any meta-clues, there are at least 3 or 4 possible interpretations of that single word. Chances are, that english is the only langauge in which those 3 or 4 meanings use the same word. Then there are phrases like: "Oh really". Without context, that can be a genuine enquiry, or pure sarcasm. In many cases, even native english speakers are hard pushed to discern the intended meaning even with the benefit of hearing the spoken inflection and being party to the context. Indeed, whenever the use of language moves beyond the simplest of purely descriptive use, the meaning heard by the listener (or read by the reader) is as much a function of the listener/readers experiences, biases and knowledge as it is of the speaker's or writer's. How often, even in this place with it's fairly constrained focus do half a dozen readers come away with different interpretations of a writer's words? If you translate a document from one langauge to another, word-by-word, you usually end with garbage. If you translate phrase by phrase, you need a huge lookup table of all the billions of possible phrases and you might end up with something that reads more fluently, but there are two problems. Using the "huge lookup table" method, the magnitude of the problem is not the hardware problem of storing and fast retreival of the translation databases. The problem is of constructing them in the first place. The 'other' method of achieving the goal, that of translating all of the syntactic, semantic, contextual, environmental and every other "...al" meaning that is embodied within natural language into some machine encodable intermediate language. So that once so encoded, the translation to other langauges can be done by applying a set of language specific "construction rules", is even less feasible. I think the problem with Natural Langauge Processing is that as yet, even the cleverest and most fluent speakers (in any language) have not found a way to use natural langauge to convey exact meanings even to others fluent in the same language. Until humans can achieve this with a reasonable degree of accuracy and precision, writing a computer program to do it is a non-starter. | [reply] |
by dragonchild (Archbishop) on Oct 17, 2004 at 00:25 UTC | |
The 'other' method of achieving the goal, that of translating all of the syntactic, semantic, contextual, environmental and every other "...al" meaning that is embodied within natural language into some machine encodable intermediate language. So that once so encoded, the translation to other langauges can be done by applying a set of language specific "construction rules", is even less feasible. Why is that so? I think that the problem is a problem of algorith and data structure. Most attempts I've seen (and my cousin wrote her masters on the very topic ... in French) attempts to follow standard sentence deconstruction, the kind you learned in English class. I think that this method fails to understand the purpose of language. Language, to my mind, is meant to convey concepts. Very fuzzy, un-boxable concepts. But, the only intermediate language we have is, well, language. So, we encode in a very lossy algorithm to words, phrases, sentences, and paragraphs. Then, the listener decodes in a similarly lossy algorithm (which isn't the same algorithm anyone else would use to decode the same text) into their framework of concepts. Usually, the paradigms are close enough or the communication is generic enough that transmission of concepts is possible. However, there are many instances, and I'm sure each of us has run into one, where the concepts we were trying to communicate did not get through. And, this is a problem, as you noted, not just a problem between speakers of different languages, but also between fluent speakers of the same language. I would like to note that such projects of constructing an intermediate language have successfully occurred in the past. The most notable example of this is the Chinese writing system. There are at least 5 major languages that use the exact same writing system. Thus, someone who speaks only Mandarin can communicate just fine with someone who speaks only Cantonese, solely by written communication. There are other examples, but none as wide-reaching. So, it is a feasible idea. And, I think, it's a critical idea. If we can come up with an intermediate language representing the actual concepts being communicated, that would revolutionize philosophy, linguistics, computer science, and a host of other fields. It's not a matter of whether this project is worthwhile. I think it's a matter of we cannot afford to not do it. Being right, does not endow the right to be rude; politeness costs nothing. | [reply] |
by BrowserUk (Patriarch) on Oct 17, 2004 at 02:03 UTC | |
that person would ask for context. A properly-written NLP program would do the same thing. If I sent you a /msg saying "Why do you deprecate tie and not bless?", you will immediately be able to respond to that question. Ask an (existing) NLP to translate that into & from any other language, or even host of other languages and you get: That looked damned impressive when I pasted it, and rubbish now I've submitted it:( I'm only vaguely fluent in one of those languages, and I would be hard pushed to recognise the question I asked, even though I am completely aware of the context, background and content. How can a NLP "ask for context"? Most of the context of that question is completely absent from this post; this entire thread; some of it depends upon information only coveyed between us (you and I) through private communications. Without having all of the background information, and/or one of us to interrogate, can you ever imagine an NLP being able to communicate the essence of the question I am asking in another language? Even a human being, fluent in English and whichever other langauge(s) we want it translated into, would be extremely hard pushed to convey the essence of that question without they also have an pretty intimate knowledge of not just programming in general, but Perl 5 specifically. Indeed, the question would be confusing and essentially meaningless, even in English, to anyone without the requisite background. And that's my point. Human speech shorthands so much, on the basis of the speaker's knowledge of the of the listener's knowledge and experience. Try the mental exercise of just how much extra information would be required to allow another native English speaker, who has no knowledge of computers or Perl, to understand that question. I seriously doubt it could be done in less than 50,000 words? Now imagine trying to translate those 50,000 words into Navaho Indian, or Inuit such that a native of those languages without computer and Perl 5 experience could understand it? By now, your probably thinking "But the recipient of such a message would have that experience, otherwise you wouldn't be asking them that question", and you would be right. But it is my contention, that if the NLP is going to be able to convey the essence of the words 'tie' and 'bless' in the question, into suitably non-bondage-related and non-religious-related terms in the target language, it would need that same knowledge. Of course, then you might say that: "If the recipient knows Perl programming, then the is no need, and it would in fact be detrimental, to translate those terms at all". But then the NLP has to have that knowledge in order to know not to translate those two terms. It would also need to 'know' that the recipent had the knowledge to understand the untranslated terms! Apply that same logic to conversation between neurosurgeons, or particle physisists, or sushi chefs, or hair-stylists, or mothers. Spoken and written langauge is rife with supposed knowledge, and contextual inference. Just as I would have extreme difficulty trying to explain the background of the question to a Japanese Sushi chef. He would have extreme difficulty in explaining the process of preparing Blowfish to me. Not only can I see no way to encapsulate all that disparate knowledge into a computer program, neither can I see how to program the computer to ask the right questions to allow the translation of such information. And, I think, it's a critical idea. If we can come up with an intermediate language representing the actual concepts being communicated, that would revolutionize philosophy, linguistics, computer science, and a host of other fields. It's not a matter of whether this project is worthwhile. I think it's a matter of we cannot afford to not do it. I agree with the sentiment of this, but not the approach. Not because I wouldn't like it to suceed, but because I simply do not see the time when this will be feasible. Even with Moore's Law working for use (for how much longer?), I do not see the means by which it would be acheivable. I also think that the underlying problem will be resolved before we find a technological solution to it, in a rather more prosaic, but ultimately more practical fashion. I think over time, the diversity of human langauge will steadily reduce until the problem "goes away". I suspect that a single common langauge will become universal. I doubt it will be recognisably any one of the currently existing langauges. More a bastardisation of several of the more widely spoken ones all run together. I imagine that there will be a fairly high content of English (because it's lingua franca in so many fields already), French (because the French are stubborn enough to ensure it. Besides which it's too nice a language to allow to die), Chinese and one of the Indian sub-continental languages (because between them they cover about 1/3rd of the world's population), and probably many bits of many others. Basically, we (our children('s children?)) will all become bi-lingual. Our 'native' tongues, and "worldspeak". Always assuming that we don't run out of fuel, water or air before then! | [reply] |
by dragonchild (Archbishop) on Oct 17, 2004 at 02:54 UTC | |
Re: X-Prize: Natural Language Processing
by hardburn (Abbot) on Oct 15, 2004 at 16:09 UTC | |
Suggestion: Give a more rigrous testing style for the "arbitrarily chosen native speaker". Something like:
This will probably need to be modified further, but should be a good start. It also adds the requirement that the program can talk over IRC, but I doubt that would be a challenge for anyone implementing a natural-language processor :) "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni. | [reply] |
Re: X-Prize: Natural Language Processing
by pmtolk (Acolyte) on Oct 17, 2004 at 19:55 UTC | |
which is an exerpt from the book "The use and misuse of Language" the article by Edmund S Glenn entitled "Semantic difficulties in international Communication" Good and cheap book, worth the read Glenn, in "Semantic Difficulties in International Communication" (also in Hayakawa) argues that difficulties transmitting ideas of one national or cultural group to another is not merely a problem of language, but is more a matter of the philosophy of the individual(s) communicating which determines how they see things and how they express their ideas. Philosopies or ideas, he feels, are what distinguish one culture group from another. "...what is meant by (national character) is in reality the embodiment of a philosophy or the habitual use of a method of judging and thinking.: (P 48) "The determination of the relationship between the patterns of thought of the cultural or national group whose ideas are to be communicated, to the patterns of thought of the cultural or national group wihich is to receive the communication, is an integral part of international communication. Failure to determine such relationships and to act in accordance with such determinations, will almost unavoidably lead to misunderstandings." Glenn gives examples of difference of philosophy in communication misunderstandings among nations based on UN debates. Also some examples which might be experienced by cross-cultural couples. For example: to the English No means No, to an Arab No means yes, but let's negotiate or discuss further (a "real" no has added emphasis) ...Indians say no when they mean yes regarding food or hospitality offered. | [reply] |