I've been playing around with this one a little, but there is no way in h... I'm gonna be able to cope with the amount of documents we have in a traditional way. For the record, I'm quite happy with the similarity score that Text::Compare produces.
I would love to have some more information about this dictionary approach. Does it mean that I create one vector for ie. 100K documents, and then compare each document against that "dictionary record", or is it something else?