I don't know of any natural language understanding system that could read the nodes you look at, extract the semantic content and and then direct you to other semantically similar web pages. That is outside the boundary of current technology.

But simpler systems are possible and in fact already exist at Perlmonks.

One method of extracting relevant nodes is to extract keyword distributions of well-liked nodes and compare them with keyword distributions of of other nodes for similarity. The SMART information retrieval system at Cornell uses an inner product metric for the similarity measure, for instance. Perlmonks has a simple version of this: Super search. Just pick keywords of nodes you like and super search for nodes with desired keywords.

Another method of retrieving relevant nodes is to take an approach used by the semantic web people: create ontologies through the use of meta information added to the nodes. Perlmonks has this too! The meta information comes in the form of categorization. In the code catacombs, Q and A, and tutorial sections, nodes are organized by category and it is very easy to find nodes on a desired subject. Other sources of meta information on Perlmonks are the author of the node, children nodes of that node, Best/Worst nodes of a time period, reputation, etc. Perlmonks is quite rich in meta information.

There is work by Naftali Tishby's group on the automatic classification of newspaper articles by using an information-theoretic clustering algorithm. The algorithm came up with surprisingly sensible clusters. Many clusters could be identified with a particular subject; others with reporter who wrote the article. It would be fun to apply such a scheme to the Perlmonks universe. Could such a nonparametric algorithm distinguish a meditation from a tutorial? Positive reputation nodes from negative?

-Mark


In reply to Re: Collaborative filtertering by kvale
in thread Collaborative filtertering by artist

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.