Take the text of a given node. Count occurrences of words that are not useless. Also count occurrences of any consecutive runs of not-useless words, such as "and password authentication is".
Do this for a few hundred scattered nodes, and add some of the less useful words and phrases found to the useless lists. This step is optional but you'll weed a lot of chaff early if you do it.
# this node minus chaff 7 : words 6 : useless 4 : nodes 4 : phrases 3 : regard 3 : replies 3 : reputation 3 : useful ... 1 : are highly regarded 1 : password authentication 1 : useful words 1 : phrases found depending 1 : empty useless phrases ...
Define what "regard" is: XP/reputation? Number of replies? Replies by saints? Front-paged? It's up to you to decide what is important.
Now you can start automating. Read nodes. Assign incremental regard to the most prevalent useful words and phrases found, depending on reputation or number of replies. Associate useful phrases with sets of nodes that are highly regarded.
--
[ e d @ h a l l e y . c c ]
In reply to Re: Tracking popularity of Perl discussion topics
by halley
in thread Tracking popularity of Perl discussion topics
by allolex
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |