in reply to Fingerprinting text documents for approximate comparison
In this file I would put perhaps:
number of significant words
average number of letters of the top 5 most common words
The three least common significant words (alphabetized)
The three most common significant words (alphabetic)
You can either use your current checksum, or create a checksum on the fingerprint files.
use similar checksums to select fingerprint files to compare, those fingerprints that are within a tolerance you set would be deemed matches.
Jsut my 2 cents worth, good luck! <!--
Enjoy!
Dageek
|
|---|