go ahead... be a heretic | |
PerlMonks |
Natural Language Index Stemmingby rob_au (Abbot) |
on Jun 18, 2002 at 01:13 UTC ( [id://175245]=perlquestion: print w/replies, xml ) | Need Help?? |
rob_au has asked for the wisdom of the Perl Monks concerning the following question:
I am curious as to the experience of others with regard to their experience with natural language stemming for site indexes. I ask this as I am in the process of rewriting a site search engine (to improve maintainability and to fit the corporate application environment) and have could across a number of discussions regarding natural language stemming in this type of application. For those unfamiliar with this concept, stemming is the process of reducing a word to its stem or root form - This allows similar words such as computer and computing to be conflated or reduced to a single root (for example, comput), thereby reducing index dictionary size and in theory, reducing storage requirements and processing time - A further discussion on this concept can be found here. While this type of processing allows for reducing index dictionary keys, I am concerned about he likelihood for stemming errors whereby dissimilar words may be stemmed to a similar root, particularly given that indexing speed and space requirements should not be an issue in the application environment - See here for a discussion on over- and under-stemming errors. And so I ask a barage of questions:
My thanks in advance
Back to
Seekers of Perl Wisdom
|
|