in reply to Tracking popularity of Perl discussion topics
Take the text of a given node. Count occurrences of words that are not useless. Also count occurrences of any consecutive runs of not-useless words, such as "and password authentication is".
Do this for a few hundred scattered nodes, and add some of the less useful words and phrases found to the useless lists. This step is optional but you'll weed a lot of chaff early if you do it.
# this node minus chaff 7 : words 6 : useless 4 : nodes 4 : phrases 3 : regard 3 : replies 3 : reputation 3 : useful ... 1 : are highly regarded 1 : password authentication 1 : useful words 1 : phrases found depending 1 : empty useless phrases ...
Define what "regard" is: XP/reputation? Number of replies? Replies by saints? Front-paged? It's up to you to decide what is important.
Now you can start automating. Read nodes. Assign incremental regard to the most prevalent useful words and phrases found, depending on reputation or number of replies. Associate useful phrases with sets of nodes that are highly regarded.
--
[ e d @ h a l l e y . c c ]
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Tracking popularity of Perl discussion topics
by allolex (Curate) on Sep 10, 2003 at 06:24 UTC |