in reply to X-prize software challenge?
Hardburn suggested that a specific subthread for entries would be a good idea, and I agree.
Post your suggestion as a reply to this node, and leave the parent thread for discussion arising from the parent node itself.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
X-Prize: Master-level Go Program
by dragonchild (Archbishop) on Oct 15, 2004 at 15:19 UTC | |
A program that, on off-the-shelf hardware, plays Go at a master level (1 dan, for instance). Criteria Being right, does not endow the right to be rude; politeness costs nothing. | [reply] |
by BrowserUk (Patriarch) on Oct 15, 2004 at 15:25 UTC | |
Suggestions: Split this node into two, leaving one idea here, and another at the same level. Clarify the "goal" of each challenge. Add a section denoting the judgement criteria for each. Update: Change the title of each post to something like: "Xprize: Natural language processing" and "Xprize: Master level Go program". Thanks. | [reply] |
by DrHyde (Prior) on Oct 15, 2004 at 16:16 UTC | |
| [reply] |
|
X-Prize: Natural Language Processing
by dragonchild (Archbishop) on Oct 15, 2004 at 15:49 UTC | |
To create a program (with any needed hardware) that can translate any arbitrary piece of text from any human language to any other human language. Criteria Being right, does not endow the right to be rude; politeness costs nothing. | [reply] |
by BrowserUk (Patriarch) on Oct 16, 2004 at 16:33 UTC | |
I think that this is not a good software X-prize contender because: The problem with every definition I've seen of "Natural Langauge Processing"; is that it assumes that it is possible to encode not only the syntactic and semantic information contained in a piece of meaningful, correct* text in such a way that all of that information can be embodied into some other langauge. It also suggests that all the meta information that the human brain devines from some auxillary clues, like context; previous awareness of the writer's style; attitudes and prejudices; and a whole lot more besides. *How do we deal with almost correct input? Even a single word can have many meanings which the human being in many cases can devine through context. Eg. Fire. In the absence of any meta-clues, there are at least 3 or 4 possible interpretations of that single word. Chances are, that english is the only langauge in which those 3 or 4 meanings use the same word. Then there are phrases like: "Oh really". Without context, that can be a genuine enquiry, or pure sarcasm. In many cases, even native english speakers are hard pushed to discern the intended meaning even with the benefit of hearing the spoken inflection and being party to the context. Indeed, whenever the use of language moves beyond the simplest of purely descriptive use, the meaning heard by the listener (or read by the reader) is as much a function of the listener/readers experiences, biases and knowledge as it is of the speaker's or writer's. How often, even in this place with it's fairly constrained focus do half a dozen readers come away with different interpretations of a writer's words? If you translate a document from one langauge to another, word-by-word, you usually end with garbage. If you translate phrase by phrase, you need a huge lookup table of all the billions of possible phrases and you might end up with something that reads more fluently, but there are two problems. Using the "huge lookup table" method, the magnitude of the problem is not the hardware problem of storing and fast retreival of the translation databases. The problem is of constructing them in the first place. The 'other' method of achieving the goal, that of translating all of the syntactic, semantic, contextual, environmental and every other "...al" meaning that is embodied within natural language into some machine encodable intermediate language. So that once so encoded, the translation to other langauges can be done by applying a set of language specific "construction rules", is even less feasible. I think the problem with Natural Langauge Processing is that as yet, even the cleverest and most fluent speakers (in any language) have not found a way to use natural langauge to convey exact meanings even to others fluent in the same language. Until humans can achieve this with a reasonable degree of accuracy and precision, writing a computer program to do it is a non-starter. | [reply] |
by dragonchild (Archbishop) on Oct 17, 2004 at 00:25 UTC | |
The 'other' method of achieving the goal, that of translating all of the syntactic, semantic, contextual, environmental and every other "...al" meaning that is embodied within natural language into some machine encodable intermediate language. So that once so encoded, the translation to other langauges can be done by applying a set of language specific "construction rules", is even less feasible. Why is that so? I think that the problem is a problem of algorith and data structure. Most attempts I've seen (and my cousin wrote her masters on the very topic ... in French) attempts to follow standard sentence deconstruction, the kind you learned in English class. I think that this method fails to understand the purpose of language. Language, to my mind, is meant to convey concepts. Very fuzzy, un-boxable concepts. But, the only intermediate language we have is, well, language. So, we encode in a very lossy algorithm to words, phrases, sentences, and paragraphs. Then, the listener decodes in a similarly lossy algorithm (which isn't the same algorithm anyone else would use to decode the same text) into their framework of concepts. Usually, the paradigms are close enough or the communication is generic enough that transmission of concepts is possible. However, there are many instances, and I'm sure each of us has run into one, where the concepts we were trying to communicate did not get through. And, this is a problem, as you noted, not just a problem between speakers of different languages, but also between fluent speakers of the same language. I would like to note that such projects of constructing an intermediate language have successfully occurred in the past. The most notable example of this is the Chinese writing system. There are at least 5 major languages that use the exact same writing system. Thus, someone who speaks only Mandarin can communicate just fine with someone who speaks only Cantonese, solely by written communication. There are other examples, but none as wide-reaching. So, it is a feasible idea. And, I think, it's a critical idea. If we can come up with an intermediate language representing the actual concepts being communicated, that would revolutionize philosophy, linguistics, computer science, and a host of other fields. It's not a matter of whether this project is worthwhile. I think it's a matter of we cannot afford to not do it. Being right, does not endow the right to be rude; politeness costs nothing. | [reply] |
by BrowserUk (Patriarch) on Oct 17, 2004 at 02:03 UTC | |
by dragonchild (Archbishop) on Oct 17, 2004 at 02:54 UTC | |
by hardburn (Abbot) on Oct 15, 2004 at 16:09 UTC | |
Suggestion: Give a more rigrous testing style for the "arbitrarily chosen native speaker". Something like:
This will probably need to be modified further, but should be a good start. It also adds the requirement that the program can talk over IRC, but I doubt that would be a challenge for anyone implementing a natural-language processor :) "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni. | [reply] |
by pmtolk (Acolyte) on Oct 17, 2004 at 19:55 UTC | |
which is an exerpt from the book "The use and misuse of Language" the article by Edmund S Glenn entitled "Semantic difficulties in international Communication" Good and cheap book, worth the read Glenn, in "Semantic Difficulties in International Communication" (also in Hayakawa) argues that difficulties transmitting ideas of one national or cultural group to another is not merely a problem of language, but is more a matter of the philosophy of the individual(s) communicating which determines how they see things and how they express their ideas. Philosopies or ideas, he feels, are what distinguish one culture group from another. "...what is meant by (national character) is in reality the embodiment of a philosophy or the habitual use of a method of judging and thinking.: (P 48) "The determination of the relationship between the patterns of thought of the cultural or national group whose ideas are to be communicated, to the patterns of thought of the cultural or national group wihich is to receive the communication, is an integral part of international communication. Failure to determine such relationships and to act in accordance with such determinations, will almost unavoidably lead to misunderstandings." Glenn gives examples of difference of philosophy in communication misunderstandings among nations based on UN debates. Also some examples which might be experienced by cross-cultural couples. For example: to the English No means No, to an Arab No means yes, but let's negotiate or discuss further (a "real" no has added emphasis) ...Indians say no when they mean yes regarding food or hospitality offered. | [reply] |
|
Re^2: X-prize Suggestions here please!
by Jaap (Curate) on Oct 16, 2004 at 12:17 UTC | |
Create a program that can learn like a baby/child with only plain text in 1 natural language (like english) as input and as output. The program will also respond as a baby, making more sense as it matures. Criteria The program must store associative information based on the input given. If the program gets input like: "A tree is green.", it must (learn to) store a connection between some entity "tree" and some other entity "green". The program may not have any hard-coded knowledge of verbs, nouns, adjectives etc. | [reply] |
by dragonchild (Archbishop) on Oct 17, 2004 at 00:30 UTC | |
Being right, does not endow the right to be rude; politeness costs nothing. | [reply] |
|
X-prize: A Knowledge Protocol
by BrowserUk (Patriarch) on Oct 15, 2004 at 19:44 UTC | |
A Knowledge Protocol.GoalTBA CriteriaWhen I (and anyone connected to the internet), can supply the following query to the protocol and receive back a few, relevant, accurate, location specific answers to the following query.
Description/JustificationThat's an ambitious, maybe even arrogant, title, but I'll try to clarify my idea and maybe someone will suggest a better one. Have you ever gone out on the web looking to find information about a particular item you are considering purchasing? I recently needed to replace my 20 year old microwave oven. So, I started out hitting the web sites of one or two of the national chains of white goods suppliers here in the UK. The result was an intensely frustrating experience.
Next, I tried Google to locate some information on "microwaves 900W UK price" and a whole slew of variations. Half the sites that turn up are US sites. Half of the rest are "Comparison shopping" sites that seemingly catch everything. Of those left, actually extracting the knowledge* that I was after, from amongst the noise, was just too painful (and probably unnecessary) to relate. So, what I am looking for is an "Knowledge protocal". There is an adage that I am not sure of the provenace of, nor could I locate it, but it says that: Anything (literally anything; words, sounds, numbers, blades of grass, fossilised feces, ash on the carpet, or the absence thereof; anything) is data. Once collated (in some fashion), data can become information. Whether said information is useful to any particular viewer is dependant upon a variety of things. But what I am seeking is not information. Visiting the Ferrari website to look for data about the fuel consumption of their vehicles, I might be presented with a banner informing me that Micheal Sheumacker's wife's sister has a friend that markets deoderent products for porcines. This may well be "information" (of the FYI or FWIW kind), but it certainly isn't what I went there seeking. It isn't knowledge. So what would a knowledge protocol allow me to do?Scenario. I send a query of the form.
to some anonymous email resender* (controversial: but why not use the distributive power of spam for good rather than bad?) The resender forwards the query to anyone whom has registered as a respondant to enquires concerning "Microwave ovens" in "UK". For the registration process, think along the lines of subscription to newsgroups and mailing lists. The resender forwards the request devoid of identifying infomation to a Knowledge Protocal Port. The deamon responds with:
Of course, there will be those that will either just link to their standard home page, or to a page that carries a re-direct to their standard home page, or otherwise try to subvert the rules of the proticol. But here the mailing list anology extends to the provision for kill-lists. Some way of extending this so that if enough* people place a particular responder on their cheaters-list, then that responder gets de-registered as a mechanism for keeping responders honest. This may sound a little like various other things around, say Froogle, but it's not. First, I've read sincere and reasoned discussion that worries whether Google isn't becoming rather too powerful. I'm also not sure, but doesn't Froogle take money to place your goods/services on the index? The whole idea of there being a central registry, or a for-money service negates the purpose of the protocol. Whilst I would want the protocol to cater for the distribution of commercial information, it should not be limited to, nor dominated by it. So, rather than a central server that would require hardware on which to run, and maintance staff, and salaries, and benefits packages et al. Why not utilise the power of Kazaa-style distributed filesharing protocols. With a suitably defined and simple protocol, leveraging existing experience with things like ftp/html/smtp etc., it should be easy to produce simple clients that would distribute the database in such a way that there is no need for centralisation and all the overheads that brings with it. Every client becomes a part of the distributed service. Help neededThat pretty much concludes the inspiration and justification for the idea. However, I am having considerable difficulty trying to whittle that down to a single goal. Part of the idea of the parent post is to allow collective thinking to come to bear on such problems, so I am going to leave the definition of the goal open for now, and settle for a loose set of Judgement Criteria as the starting point. Maybe, if this thread, and this post grabs enough mindshare to interest people, then both of these will be refined over time to better reflect my aspirations for it. | [reply] [d/l] [select] |
by tilly (Archbishop) on Oct 15, 2004 at 19:55 UTC | |
| [reply] |
by BrowserUk (Patriarch) on Oct 15, 2004 at 22:43 UTC | |
I've read a few bits on the semantic web efforts. The problem I see with it is that it is fairly typical of all the latest specifications coming out of W3c and similar bodies. All-encompassing; over-engineered; heavyweight. The remarkable thing about most of the early protocols upon which the internet is based, is just how lightweight and simple they are. You can connect to a telnet server from a wap-enabled mobile phone and do everything you could from a fully-fledged terminal. You can connect to a pop3 server and do everything from a command line. Same for ftp, sftp, smtp and almost all of the other basic protocols. All the bells and whilstles that a good email program, terminal program etc. layer on top are nice to haves, but the underlying protocols that drive the network are simple. What I've seen of the semantic web talks about using XML (already to complicated). XPath (worse). Resource Description Framework (hugely complicated). Layers upon layers, complications on top of complications. Simple fundamental principles that stood the early protocols in good stead have been forgotten or ignored. Question: What makes XML so difficult to process? Answer: You have to read to the end to know where anything is. The Mime protocol followed early transmission protocol pratices. Each part or sub-part is preceeded by it's length. That way, when your reading something, you know how much you need to read. You can choose to stop when you have what you want. XML on the other hand, forces you to read to the end. You can never be sure that you have anything at all, until you have read the closing tag for the first element you receive. That's what makes processing XML as a stream such a pain. XMLTwig and similar tools allow you to pretend that you can process bite sized chunks, but if at the final step, the last close tag is the wrong one; corrupted; missing; then all bets are off because according to the XML standard, it isn't a "well-formed document", and the standard provides for no recovery or partial interpretation. Any independant mechanism, like XMLTwig or even the way Browsers provide for handling of imperfectly formed HTML, is outside of the specification, therefore not subject to any rules. This is why different browsers, present the same ill-formed HTML in different ways. A transmission protocol that didn't provide for error detection and error recovery would be laughed out of court. It's my opinion that any data communication protocol that says: "Everything must be perfect or we aren't going to play", should equally be laughed out of court. The sad thing is, XML could be fixed, in this regard, quite easily. I think that continuing the tradition of:
has a lot of merit. I also think that de-centralisation of information provider directory has a huge merit. The problem with what I've read of the semantic web is that either every client has to individually contact every possible information provider to gather information; or it has to contact a central information provider directory service, which requires large volumes of storage and processor power, and therefore will need to be funded. Once you get a paid-for (by the client) service, the reponses from the service are controlled by commercial interests, and are then possible subjects for paid-for (by the information providers) prioritisation. Once again the clients--you, me and other Joe Public--end up getting to see only what some commercial service provider is paid the most to show us. | [reply] [d/l] [select] |
by BUU (Prior) on Oct 15, 2004 at 20:46 UTC | |
Addressing your implementation ideas, at first read it sounds like you want all of this to be done manually?! You send an email to the list and everyone reads it and possibly responds? | [reply] |
by BrowserUk (Patriarch) on Oct 15, 2004 at 21:24 UTC | |
...it sounds like what you want is a search engine that returns actual data... ...at first read it sounds like you want all of this to be done manually? No. The idea is that information providers (commercial or otherwise) register themselves as responders. Doing this, they would provide an IP/port combination that would respond to the knowledge protocol; and to those query subjects for which they have registered. It is up to each information provider to perform the searching of their site only (thought this might be contracted out), in response to the query and return the information requested. The distributed database I mentioned would only contain the registration database, kill-list information, maybe even a information provider rating system, but no actual information vis-a-vis answers to queries. All the information delivered by the protocol would be supplied by the information providers in response to queries. If they don't want to release the information into the public domain, they simply do not provide it--and their rating/kill-list position should quickly reflect the fidelity of their responses. The process would be automated, and probably not be email-based. | [reply] |