in reply to Tracking popularity of Perl discussion topics

The root of your problem would be in identifying what the topics of a given thread/node are.

Assuming you don't want to hire a team of helper monkeys to classify every node in to a Taxonomy, then you would need to look into some of the many attempts at classification of documents by programmitic analysis. This comes up about once a year on slashdot in the form of a "has anymore found a good way to categorize all of your email?" question ... there seem to be some decent algorithms out there for doing classification, but many of them require a predefined list of categories with sample sets.

I've heard of systems that can find common topics among large quantities of text, but i've never really looked into it in depth.

  • Comment on Re: Tracking popularity of Perl discussion topics

Replies are listed 'Best First'.
Re: Re: Tracking popularity of Perl discussion topics
by allolex (Curate) on Sep 09, 2003 at 06:25 UTC

    I have given this one some thought, considering that it is the linguistic aspect of this whole idea. (You know me...)

    How about creating an ontology based on the 'core' vocabulary of the highest-rated nodes of a hand-marked thread? For example, we have a question or meditation that we mark as 'security', 'password', 'login'. Then we take the nodes from a median score upward (because they are most likely to be relevant to the topic) and extracting their vocabulary, storing it in a keyword list (with verbs, nouns, adjectives) that represents the junction of the topics mentioned above. That would be a quick and dirty way of defining what members belong to an topic category.

    These could be split up later and put into XML topic maps which, by virtue of their structure, would allow topic clustering on a much larger scale.

    --
    Allolex