in reply to Fingerprinting text documents for approximate comparison

There's Digest::Nilsimsa, which might be worth looking into. I haven't tried it, but it sounds like it's designed to do what you're looking for.
  • Comment on Re: Fingerprinting text documents for approximate comparison