in reply to Re: X-prize Suggestions here please!
in thread X-prize software challenge?

A Knowledge Protocol.

Goal

TBA

Criteria

When I (and anyone connected to the internet), can supply the following query to the protocol and receive back a few, relevant, accurate, location specific answers to the following query.

Item: Microwave oven Location: UK Desired: Price, make, model, size, power.

Description/Justification

That's an ambitious, maybe even arrogant, title, but I'll try to clarify my idea and maybe someone will suggest a better one.

Have you ever gone out on the web looking to find information about a particular item you are considering purchasing?

I recently needed to replace my 20 year old microwave oven. So, I started out hitting the web sites of one or two of the national chains of white goods suppliers here in the UK. The result was an intensely frustrating experience.

Next, I tried Google to locate some information on "microwaves 900W UK price" and a whole slew of variations. Half the sites that turn up are US sites. Half of the rest are "Comparison shopping" sites that seemingly catch everything. Of those left, actually extracting the knowledge* that I was after, from amongst the noise, was just too painful (and probably unnecessary) to relate.

So, what I am looking for is an "Knowledge protocal".

There is an adage that I am not sure of the provenace of, nor could I locate it, but it says that:

Anything (literally anything; words, sounds, numbers, blades of grass, fossilised feces, ash on the carpet, or the absence thereof; anything) is data.

Once collated (in some fashion), data can become information. Whether said information is useful to any particular viewer is dependant upon a variety of things.

But what I am seeking is not information. Visiting the Ferrari website to look for data about the fuel consumption of their vehicles, I might be presented with a banner informing me that

Micheal Sheumacker's wife's sister has a friend that markets deoderent products for porcines.

This may well be "information" (of the FYI or FWIW kind), but it certainly isn't what I went there seeking.

It isn't knowledge.

So what would a knowledge protocol allow me to do?

Scenario. I send a query of the form.

Item: Microwave oven Location: UK Desired: Price, make, model, size, power.

to some anonymous email resender* (controversial: but why not use the distributive power of spam for good rather than bad?)

The resender forwards the query to anyone whom has registered as a respondant to enquires concerning "Microwave ovens" in "UK". For the registration process, think along the lines of subscription to newsgroups and mailing lists.

The resender forwards the request devoid of identifying infomation to a Knowledge Protocal Port.

The deamon responds with:

  1. The requested information as defined by the "Desired" card
  2. To make it commercially interesting, a single url that should lead to a page that expands upon the requested information. And specifically, the requested information.

Of course, there will be those that will either just link to their standard home page, or to a page that carries a re-direct to their standard home page, or otherwise try to subvert the rules of the proticol. But here the mailing list anology extends to the provision for kill-lists. Some way of extending this so that if enough* people place a particular responder on their cheaters-list, then that responder gets de-registered as a mechanism for keeping responders honest.

This may sound a little like various other things around, say Froogle, but it's not. First, I've read sincere and reasoned discussion that worries whether Google isn't becoming rather too powerful. I'm also not sure, but doesn't Froogle take money to place your goods/services on the index?

The whole idea of there being a central registry, or a for-money service negates the purpose of the protocol. Whilst I would want the protocol to cater for the distribution of commercial information, it should not be limited to, nor dominated by it.

So, rather than a central server that would require hardware on which to run, and maintance staff, and salaries, and benefits packages et al. Why not utilise the power of Kazaa-style distributed filesharing protocols. With a suitably defined and simple protocol, leveraging existing experience with things like ftp/html/smtp etc., it should be easy to produce simple clients that would distribute the database in such a way that there is no need for centralisation and all the overheads that brings with it. Every client becomes a part of the distributed service.

Help needed

That pretty much concludes the inspiration and justification for the idea. However, I am having considerable difficulty trying to whittle that down to a single goal. Part of the idea of the parent post is to allow collective thinking to come to bear on such problems, so I am going to leave the definition of the goal open for now, and settle for a loose set of Judgement Criteria as the starting point. Maybe, if this thread, and this post grabs enough mindshare to interest people, then both of these will be refined over time to better reflect my aspirations for it.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re: X-prize: A Knowledge Protocol
by tilly (Archbishop) on Oct 15, 2004 at 19:55 UTC
    The semantic web folks are trying to do what you want. I have not followed what they are doing though, so I can't give you a sense of how viable it is or isn't. (But it certainly has attracted some interest.)

      I've read a few bits on the semantic web efforts. The problem I see with it is that it is fairly typical of all the latest specifications coming out of W3c and similar bodies.

      All-encompassing; over-engineered; heavyweight.

      The remarkable thing about most of the early protocols upon which the internet is based, is just how lightweight and simple they are. You can connect to a telnet server from a wap-enabled mobile phone and do everything you could from a fully-fledged terminal. You can connect to a pop3 server and do everything from a command line. Same for ftp, sftp, smtp and almost all of the other basic protocols.

      All the bells and whilstles that a good email program, terminal program etc. layer on top are nice to haves, but the underlying protocols that drive the network are simple.

      What I've seen of the semantic web talks about using XML (already to complicated). XPath (worse). Resource Description Framework (hugely complicated).

      Layers upon layers, complications on top of complications. Simple fundamental principles that stood the early protocols in good stead have been forgotten or ignored.

      Question: What makes XML so difficult to process?

      Answer: You have to read to the end to know where anything is.

      The Mime protocol followed early transmission protocol pratices. Each part or sub-part is preceeded by it's length. That way, when your reading something, you know how much you need to read. You can choose to stop when you have what you want.

      XML on the other hand, forces you to read to the end. You can never be sure that you have anything at all, until you have read the closing tag for the first element you receive. That's what makes processing XML as a stream such a pain. XMLTwig and similar tools allow you to pretend that you can process bite sized chunks, but if at the final step, the last close tag is the wrong one; corrupted; missing; then all bets are off because according to the XML standard, it isn't a "well-formed document", and the standard provides for no recovery or partial interpretation.

      Any independant mechanism, like XMLTwig or even the way Browsers provide for handling of imperfectly formed HTML, is outside of the specification, therefore not subject to any rules. This is why different browsers, present the same ill-formed HTML in different ways.

      A transmission protocol that didn't provide for error detection and error recovery would be laughed out of court. It's my opinion that any data communication protocol that says: "Everything must be perfect or we aren't going to play", should equally be laughed out of court.

      The sad thing is, XML could be fixed, in this regard, quite easily.

      I think that continuing the tradition of:

      fieldname: field data\n\n

      has a lot of merit. I also think that de-centralisation of information provider directory has a huge merit.

      The problem with what I've read of the semantic web is that either every client has to individually contact every possible information provider to gather information; or it has to contact a central information provider directory service, which requires large volumes of storage and processor power, and therefore will need to be funded.

      Once you get a paid-for (by the client) service, the reponses from the service are controlled by commercial interests, and are then possible subjects for paid-for (by the information providers) prioritisation.

      Once again the clients--you, me and other Joe Public--end up getting to see only what some commercial service provider is paid the most to show us.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: X-prize: A Knowledge Protocol
by BUU (Prior) on Oct 15, 2004 at 20:46 UTC
    Ignoring your suggestions for implementation for the moment, it sounds like what you want is a search engine that returns actual data, not meaninless webpages. Which sounds nice and all, but if it's a third party program, wouldn't returning just the data be a copy right violation?

    Addressing your implementation ideas, at first read it sounds like you want all of this to be done manually?! You send an email to the list and everyone reads it and possibly responds?
      ...it sounds like what you want is a search engine that returns actual data...
      ...at first read it sounds like you want all of this to be done manually?

      No. The idea is that information providers (commercial or otherwise) register themselves as responders. Doing this, they would provide an IP/port combination that would respond to the knowledge protocol; and to those query subjects for which they have registered.

      It is up to each information provider to perform the searching of their site only (thought this might be contracted out), in response to the query and return the information requested.

      The distributed database I mentioned would only contain the registration database, kill-list information, maybe even a information provider rating system, but no actual information vis-a-vis answers to queries.

      All the information delivered by the protocol would be supplied by the information providers in response to queries. If they don't want to release the information into the public domain, they simply do not provide it--and their rating/kill-list position should quickly reflect the fidelity of their responses.

      The process would be automated, and probably not be email-based.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon