good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
There is a little experiment that is instructive in this case: Take a text (any text), get the histogram of its characters (that means how many of each are there), then of every pair (please note "abc" has the pairs "ab" and "bc"), then of any triple and so on. Obviously if you go all the way to the length of the text, it will be possible to reconstruct the text from the set of histograms. Now the real test begins: How large a set (how many histograms) do you need to reconstruct the text (approximately)?
To reconstruct the text use a random number generator to output letters checking that all statistic properties of the set of histograms are met by the constructed string. The interesting result is that most texts need only 9 histograms. What if you only compare the histograms? In reply to Re: Brainstorming session: detecting plagiarism
by Anonymous Monk
|
|