in reply to Re^4: Fingerprinting text documents for approximate comparison
in thread Fingerprinting text documents for approximate comparison
Couldn't the OP use your technique... AND in parallel, run a m// on the rejected (ie, misspelt, propernames, typos), etc, for a simple correlation of those, and weight the value of the extent into the other? Or does that load this up so heavily that his concern for processing time overwhelms his project?
.oO noodling...
And a third test for common-misspellings and typos to eliminate most of their noise? Think I've seen somewhere a "dictionary" of (allegedly statistically sound) typos/misspellings.
|
|---|