Pathologically Eclectic Rubbish Lister | |
PerlMonks |
Re^2: word similarity measureby Gavin (Archbishop) |
on Feb 28, 2009 at 12:32 UTC ( [id://747134]=note: print w/replies, xml ) | Need Help?? |
Or perhaps Vector Space This module takes a list of documents (in English) and builds a simple in-memory search engine using a vector space model. Documents are stored as PDL objects, and after the initial indexing phase, the search should be very fast. This implementation applies a rudimentary stop list to filter out very common words, and uses a cosine measure to calculate document similarity.
In Section
Seekers of Perl Wisdom
|
|