Thanks a lot for your outline. It looks like a quick and efficient way of categorizing threads. I would like to add some linguistic knowledge to your algorithm, though. :)

The more I think about how exactly I would do an implementation, the less I like the idea of just knocking off the most common words, since they can be important in combination with other words. If we want to have big endian, little endian, and the topic of dealing with big or large files, we have have to reorganize the I think that maybe getting rid of what linguists call "functional categories" such as determiners (the, a, these) entirely and leaving quantifiers (some, many, all) up to the search engine, we might be able to retain those common words that do play some role in defining a topic. I think lexical categories like nouns, verbs, adjectives/adverbs (but not prepositions) are the way to go.

I think what you call "regard" here should be (XP_MAX_REPLY + XP_MIN_REPLY) * 0.5, but I'd have to examine how XP is really distributed across nodes in a thread before going further. Plus, this little formula doesn't add or substract XP significance according to where the node is nested. (I think it would be a good idea to count replies to replies, maybe down to the third level of nesting. After that, the topic value tends to be either too specific to just a couple of the posters personal usage, or simply irrelevant to the original question.

--
Allolex


In reply to Re: Re: Tracking popularity of Perl discussion topics by allolex
in thread Tracking popularity of Perl discussion topics by allolex

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.