Take a list of the N most common words in the English language. This is your "useless words" list. Also create an empty "useless phrases" list.

Take the text of a given node. Count occurrences of words that are not useless. Also count occurrences of any consecutive runs of not-useless words, such as "and password authentication is".

Do this for a few hundred scattered nodes, and add some of the less useful words and phrases found to the useless lists. This step is optional but you'll weed a lot of chaff early if you do it.

# this node minus chaff 7 : words 6 : useless 4 : nodes 4 : phrases 3 : regard 3 : replies 3 : reputation 3 : useful ... 1 : are highly regarded 1 : password authentication 1 : useful words 1 : phrases found depending 1 : empty useless phrases ...

Define what "regard" is: XP/reputation? Number of replies? Replies by saints? Front-paged? It's up to you to decide what is important.

Now you can start automating. Read nodes. Assign incremental regard to the most prevalent useful words and phrases found, depending on reputation or number of replies. Associate useful phrases with sets of nodes that are highly regarded.

--
[ e d @ h a l l e y . c c ]


In reply to Re: Tracking popularity of Perl discussion topics by halley
in thread Tracking popularity of Perl discussion topics by allolex

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.