in reply to word similarity measure

I think you want to read Building a Vector Search Engine in Perl.

Replies are listed 'Best First'.
Re^2: word similarity measure
by Gavin (Archbishop) on Feb 28, 2009 at 12:32 UTC

    Or perhaps Vector Space

    This module takes a list of documents (in English) and builds a simple in-memory search engine using a vector space model. Documents are stored as PDL objects, and after the initial indexing phase, the search should be very fast. This implementation applies a rudimentary stop list to filter out very common words, and uses a cosine measure to calculate document similarity.