in reply to Fingerprinting text documents for approximate comparison
I don't have a direct solution, but I would think that similar algorithms to those used for spam fingerprinting might work. I think that Vipul's Razor uses this technique, as do Pyzor and Dcc.
In looking through the Dcc website, it seems that the keyword you might want to try searching on is 'fuzzy matching'.
|
|---|