Re: Collaborative filtertering

I don't know of any natural language understanding system that could read the nodes you look at, extract the semantic content and and then direct you to other semantically similar web pages. That is outside the boundary of current technology.

But simpler systems are possible and in fact already exist at Perlmonks.

One method of extracting relevant nodes is to extract keyword distributions of well-liked nodes and compare them with keyword distributions of of other nodes for similarity. The SMART information retrieval system at Cornell uses an inner product metric for the similarity measure, for instance. Perlmonks has a simple version of this: Super search. Just pick keywords of nodes you like and super search for nodes with desired keywords.

Another method of retrieving relevant nodes is to take an approach used by the semantic web people: create ontologies through the use of meta information added to the nodes. Perlmonks has this too! The meta information comes in the form of categorization. In the code catacombs, Q and A, and tutorial sections, nodes are organized by category and it is very easy to find nodes on a desired subject. Other sources of meta information on Perlmonks are the author of the node, children nodes of that node, Best/Worst nodes of a time period, reputation, etc. Perlmonks is quite rich in meta information.

There is work by Naftali Tishby's group on the automatic classification of newspaper articles by using an information-theoretic clustering algorithm. The algorithm came up with surprisingly sensible clusters. Many clusters could be identified with a particular subject; others with reporter who wrote the article. It would be fun to apply such a scheme to the Perlmonks universe. Could such a nonparametric algorithm distinguish a meditation from a tutorial? Positive reputation nodes from negative?

-Mark

Comment on Re: Collaborative filtertering