Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Brainstorming session: detecting plagiarism

by Anonymous Monk
on Jun 11, 2005 at 01:32 UTC ( [id://465729]=note: print w/replies, xml ) Need Help??


in reply to Brainstorming session: detecting plagiarism

There is a little experiment that is instructive in this case: Take a text (any text), get the histogram of its characters (that means how many of each are there), then of every pair (please note "abc" has the pairs "ab" and "bc"), then of any triple and so on. Obviously if you go all the way to the length of the text, it will be possible to reconstruct the text from the set of histograms. Now the real test begins: How large a set (how many histograms) do you need to reconstruct the text (approximately)?
To reconstruct the text use a random number generator to output letters checking that all statistic properties of the set of histograms are met by the constructed string.
The interesting result is that most texts need only 9 histograms. What if you only compare the histograms?
  • Comment on Re: Brainstorming session: detecting plagiarism

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://465729]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-24 20:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found