The scale -128 to +128 appears to be arbitrary and could just as well be 0-255.
That inference comes from the author's statement re his example output from two slightly variant sources: "The nilsimsa of these two codes is 92 on a scale of -128 to +128. That means that 36 bits are different and 220 bits the same. Any nilsimsa over 24 (which is 3 sigma) indicates that the two messages are probably not independently generated."BTW, I attach high value to the observations (below) from BrowserUK and sfink (but am not sure I'm ready to buy (no offense intended, BrowserUK!) BUK's "no easy way" as (1)gospel nor (2, and more important) as any reason not to search for a way.
In reply to Re^3: Fingerprinting text documents for approximate comparison
by ww
in thread Fingerprinting text documents for approximate comparison
by Mur
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |