I don't have a direct solution, but I would think that similar algorithms to those used for spam fingerprinting might work. I think that Vipul's Razor uses this technique, as do Pyzor and Dcc.
In looking through the Dcc website, it seems that the keyword you might want to try searching on is 'fuzzy matching'.
In reply to Re: Fingerprinting text documents for approximate comparison
by jhourcle
in thread Fingerprinting text documents for approximate comparison
by Mur
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |