Re^5: Fingerprinting text documents for approximate comparison

maybe this is too dumb for words, but it's late...zzzZzzzzz....

Couldn't the OP use your technique... AND in parallel, run a m// on the rejected (ie, misspelt, propernames, typos), etc, for a simple correlation of those, and weight the value of the extent into the other? Or does that load this up so heavily that his concern for processing time overwhelms his project?

.oO noodling...

And a third test for common-misspellings and typos to eliminate most of their noise? Think I've seen somewhere a "dictionary" of (allegedly statistically sound) typos/misspellings.

Comment on Re^5: Fingerprinting text documents for approximate comparison