I think the bigger difference is that software is relatively cheap to create, even complex software like operating systems, web browsers, graphic rendering, general-purpose servers, and even languages itself. The cost for SpaceShipOne to win the M$10 Ansari prize has been estimated at over M$20. That doesn't count the amount the other 70-odd entrants spent. It's arguable that nearly M$500 was spent on the Ansari prize. It's doubtful that this much has been spent on any major opensource software offering, even Linux.
Part of the other problem is that privately-funded space travel is fungible. Software, intrinsically, is not, notwithstanding the excellent efforts from Redmond. And, given the efforts of Google and others, it's rapidly becoming less fungible.
My feeling is that a good software X-Prize would be something along the lines of true generic natural-language processing or a program that plays Go at the master level. Feasible ... just "Really Hard"™.
There used to be a prize for a Go program, but the person offering it passed away in 1997. Maybe, someone much richer than I should take it back up. A master-level Go program on today's hardware would be a quantum advance in certain algorithms.
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
| [reply] |
Goal A program that, on off-the-shelf hardware, plays Go at a master level (1 dan, for instance).
Criteria
- Achieve 1 dan in the standard fashion prescribed by the International Go Association, modified solely by the fact that the person making the moves would not be the person playing the game, a la Deep Blue vs. Kasparov.
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
| [reply] |
Suggestions:
Split this node into two, leaving one idea here, and another at the same level.
Clarify the "goal" of each challenge.
Add a section denoting the judgement criteria for each.
Update: Change the title of each post to something like: "Xprize: Natural language processing" and "Xprize: Master level Go program".
Thanks.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
| [reply] |
I think that this is not a good software X-prize contender because:
- What constitutes "any human langauge"?
- LA street slang?
- Egyptian hyroglyphics?
- Chaucer's english?
- Pidgeon?
- Bill & Ted speak?
- Clockwork Orange "newspeak"?
- WW2 Navaho Indian code?
- Is this is an "arbitrary piece of text".
platelayers paratrooper spumoni subversive bala womenfolk zealot wangling gym clout proxemic abravanel entryway assimilates faucets dialup's lamellate apparent propositioning olefin froude.
- Neither the goal nor the criteria specify anything about meaning.
Input: "Mich interessiert das Thema, weil ich fachlich/ beruflich mit Betroffenen zu tun habe."
Output: "Engaged computing means computational processes that are engaged with the world—think embedded systems."
Both sentences are (probably) pretty well-formed in there respective languages. The cause of my indecison is that:
- I don't speak German, so I can comment on the first.
- My native english skills are far from brilliant. The second seems to make sense, and was probably written by a human being, but whether a English language teacher would find it so is a different matter.
However, the two sentances (probably) have very little common meaning, as I picked them at random off the net.
The problem with every definition I've seen of "Natural Langauge Processing"; is that it assumes that it is possible to encode not only the syntactic and semantic information contained in a piece of meaningful, correct* text in such a way that all of that information can be embodied into some other langauge. It also suggests that all the meta information that the human brain devines from some auxillary clues, like context; previous awareness of the writer's style; attitudes and prejudices; and a whole lot more besides.
*How do we deal with almost correct input?
Even a single word can have many meanings which the human being in many cases can devine through context. Eg.
Fire.
In the absence of any meta-clues, there are at least 3 or 4 possible interpretations of that single word. Chances are, that english is the only langauge in which those 3 or 4 meanings use the same word.
Then there are phrases like: "Oh really". Without context, that can be a genuine enquiry, or pure sarcasm. In many cases, even native english speakers are hard pushed to discern the intended meaning even with the benefit of hearing the spoken inflection and being party to the context.
Indeed, whenever the use of language moves beyond the simplest of purely descriptive use, the meaning heard by the listener (or read by the reader) is as much a function of the listener/readers experiences, biases and knowledge as it is of the speaker's or writer's.
How often, even in this place with it's fairly constrained focus do half a dozen readers come away with different interpretations of a writer's words?
If you translate a document from one langauge to another, word-by-word, you usually end with garbage. If you translate phrase by phrase, you need a huge lookup table of all the billions of possible phrases and you might end up with something that reads more fluently, but there are two problems.
- The mappings of phrase to phrase in each langauge would need to be done by a human being fluent in both langauges (or a pair of native speakers of the two langauges that could contrast and compare possible meanings until they arrived at a concensus). This would be a huge undertaking for any single pair of langauges; but for all human languages?
- Even then, it's not too hard to sit and construct a phrase in english and a translation of it in a second langauge that would be correct in some contexts but utterly wrong in others.
How do you encapsulate the variability between what the writer intended to write, and what the reader read? And more so, the differences in meaning percieved between two or more readers reading the same words? Or the same reader, reading the same words in two or more different contexts?
Using the "huge lookup table" method, the magnitude of the problem is not the hardware problem of storing and fast retreival of the translation databases. The problem is of constructing them in the first place.
The 'other' method of achieving the goal, that of translating all of the syntactic, semantic, contextual, environmental and every other "...al" meaning that is embodied within natural language into some machine encodable intermediate language. So that once so encoded, the translation to other langauges can be done by applying a set of language specific "construction rules", is even less feasible.
I think the problem with Natural Langauge Processing is that as yet, even the cleverest and most fluent speakers (in any language) have not found a way to use natural langauge to convey exact meanings even to others fluent in the same language.
Until humans can achieve this with a reasonable degree of accuracy and precision, writing a computer program to do it is a non-starter.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
| [reply] |
Suggestion: Give a more rigrous testing style for the "arbitrarily chosen native speaker". Something like:
The tester sits in one room with a computer connected to an IRC server in a private room. Two other users are allowed in the IRC room (but only one of them is in it at once), one of which is the program and the other is a second arbitrarily chosen native speaker. After an hour of questioning, the tester will make a guess as to which user is a program and which is a human. The test is repeated with other native speakers (up to some TBD number of tests). To win, the testers must guess incorrectly at least 50% of the time.
This will probably need to be modified further, but should be a good start. It also adds the requirement that the program can talk over IRC, but I doubt that would be a challenge for anyone implementing a natural-language processor :)
"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.
| [reply] |
I cut and pasted from
from A. Cottrell's research on Indian-Western couples living in India
which is an exerpt from the book "The use and misuse of Language" the article by Edmund S Glenn entitled
"Semantic difficulties in international Communication"
Good and cheap book, worth the read
Glenn, in "Semantic Difficulties in International Communication" (also in Hayakawa) argues that difficulties transmitting ideas of one national or cultural group to another is not merely a problem of language, but is more a matter of the philosophy of the individual(s) communicating which determines how they see things and how they express their ideas. Philosopies or ideas, he feels, are what distinguish one culture group from another. "...what is meant by (national character) is in reality the embodiment of a philosophy or the habitual use of a method of judging and thinking.: (P 48) "The determination of the relationship between the patterns of thought of the cultural or national group whose ideas are to be communicated, to the patterns of thought of the cultural or national group wihich is to receive the communication, is an integral part of international communication. Failure to determine such relationships and to act in accordance with such determinations, will almost unavoidably lead to misunderstandings." Glenn gives examples of difference of philosophy in communication misunderstandings among nations based on UN debates. Also some examples which might be experienced by cross-cultural couples. For example: to the English No means No, to an Arab No means yes, but let's negotiate or discuss further (a "real" no has added emphasis) ...Indians say no when they mean yes regarding food or hospitality offered.
| [reply] |
A Knowledge Protocol.
Goal
TBA
Criteria
When I (and anyone connected to the internet), can supply the following query to the protocol and receive back a few, relevant, accurate, location specific answers to the following query.
Item: Microwave oven
Location: UK
Desired: Price, make, model, size, power.
Description/Justification
That's an ambitious, maybe even arrogant, title, but I'll try to clarify my idea and maybe someone will suggest a better one.
Have you ever gone out on the web looking to find information about a particular item you are considering purchasing?
I recently needed to replace my 20 year old microwave oven. So, I started out hitting the web sites of one or two of the national chains of white goods suppliers here in the UK. The result was an intensely frustrating experience.
- All of them wanted to set cookies even before they had shown me anything that I wanted to see.
No thanks. If I see what I want, and choose to make a purchase from you, I may allow a session cookie, that is only valid within the current domain, for the duration of the transaction.
but otherwise: Go F...er.. No thankyou very much.
And no amount of "It allows us to give you a better shopping experience" or any of the other lame excuses that I have recieved as replies to my complaints to sites will change my mind in this.
- Most of them either didn't work at all, or didn't work correctly without javascript enabled.
Bad luck guys! You lost my custum before we even got started.
- Some, fewer these days thank goodness, won't work unless you are using Internet Explorer.
Why the hell would anyone use IE (of any falvour)? It is single-handly responsible for the transmission of something like 90% of all the viruses, trojans, and other forms of internet nasties (probably 95% if you add Ooutlook Express) Why force me to use it? The upshot is, if you try, you lost me. (I just wish more people would take my lead and refuse to use websites that do these things--then 'they' would get the message).
- Flash
Just say no. No, say "No way". Better yet. Send sites that insist on using a 300k flash animation to say "Welcome! Click here to continue", a 10 MB flash animation in an email saying "No! No flash! No how, no way. NO! Bye sucker"
- Have you ever looked at how much of a typical retail site page (in terms of screen real-estate area) is actually devoted to thing that the page purports to be about (the XYV model-pqr microwave for instance) and how much is a randomly distributed mish-mash of irrelavent information that is either needlessly displayed on every page of the site, or thrown up at random regardless of what it is that are actually doing?
Why is Google so successful? Because of it's superior search engine technology? Maybe, now, but my original criteria for going there was the total lack of crap. With half of the search engines around, you have (or had) to wait 20 minutes for 300kb of crap to arrive and be formatted before you even got to type in your query, and as for the volumes of crap that is (was) presented after the query.
Displayed by dictionary.reference.com after you search for how to spell: "excretia"
Get the Most Popular Sites for "Excretia" #### Yeah, right!
Suggestions:
Exc retia #### No such word. Why offer it?
Exc-retia #### Ditto
Excreta
Excretion
Excrete
Excreter
Excreation
Excreate
Any of the increasing number of Google wannabes that wants to attract my patronage will need to have learned that lesson.
Next, I tried Google to locate some information on "microwaves 900W UK price" and a whole slew of variations. Half the sites that turn up are US sites. Half of the rest are "Comparison shopping" sites that seemingly catch everything. Of those left, actually extracting the knowledge* that I was after, from amongst the noise, was just too painful (and probably unnecessary) to relate.
So, what I am looking for is an "Knowledge protocal".
There is an adage that I am not sure of the provenace of, nor could I locate it, but it says that:
Anything (literally anything; words, sounds, numbers, blades of grass, fossilised feces, ash on the carpet, or the absence thereof; anything) is data.
Once collated (in some fashion), data can become information. Whether said information is useful to any particular viewer is dependant upon a variety of things.
But what I am seeking is not information. Visiting the Ferrari website to look for data about the fuel consumption of their vehicles, I might be presented with a banner informing me that
Micheal Sheumacker's wife's sister has a friend that markets deoderent products for porcines.
This may well be "information" (of the FYI or FWIW kind), but it certainly isn't what I went there seeking.
It isn't knowledge.
So what would a knowledge protocol allow me to do?
Scenario. I send a query of the form.
Item: Microwave oven
Location: UK
Desired: Price, make, model, size, power.
to some anonymous email resender* (controversial: but why not use the distributive power of spam for good rather than bad?)
The resender forwards the query to anyone whom has registered as a respondant to enquires concerning "Microwave ovens" in "UK". For the registration process, think along the lines of subscription to newsgroups and mailing lists.
The resender forwards the request devoid of identifying infomation to a Knowledge Protocal Port.
The deamon responds with:
- The requested information as defined by the "Desired" card
- To make it commercially interesting, a single url that should lead to a page that expands upon the requested information. And specifically, the requested information.
Of course, there will be those that will either just link to their standard home page, or to a page that carries a re-direct to their standard home page, or otherwise try to subvert the rules of the proticol. But here the mailing list anology extends to the provision for kill-lists. Some way of extending this so that if enough* people place a particular responder on their cheaters-list, then that responder gets de-registered as a mechanism for keeping responders honest.
This may sound a little like various other things around, say Froogle, but it's not. First, I've read sincere and reasoned discussion that worries whether Google isn't becoming rather too powerful. I'm also not sure, but doesn't Froogle take money to place your goods/services on the index?
The whole idea of there being a central registry, or a for-money service negates the purpose of the protocol. Whilst I would want the protocol to cater for the distribution of commercial information, it should not be limited to, nor dominated by it.
So, rather than a central server that would require hardware on which to run, and maintance staff, and salaries, and benefits packages et al. Why not utilise the power of Kazaa-style distributed filesharing protocols. With a suitably defined and simple protocol, leveraging existing experience with things like ftp/html/smtp etc., it should be easy to produce simple clients that would distribute the database in such a way that there is no need for centralisation and all the overheads that brings with it. Every client becomes a part of the distributed service.
Help needed
That pretty much concludes the inspiration and justification for the idea. However, I am having considerable difficulty trying to whittle that down to a single goal. Part of the idea of the parent post is to allow collective thinking to come to bear on such problems, so I am going to leave the definition of the goal open for now, and settle for a loose set of Judgement Criteria as the starting point. Maybe, if this thread, and this post grabs enough mindshare to interest people, then both of these will be refined over time to better reflect my aspirations for it.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
| [reply] [d/l] [select] |