Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re^3: Brainstorming session: detecting plagiarismby halley (Prior) |
on Jun 09, 2005 at 00:57 UTC ( [id://464917]=note: print w/replies, xml ) | Need Help?? |
There are many lexicons out there, and they often include a ranking by frequency found in a large source such as the Bible or the New York Times. One such popular lexicon for English is the Moby Project, and it includes two such rankings. Google will give you hints there.
To find statistically improbable word pairs, one method is trivial: you take the product of word frequencies for each consecutive pair of words, and search for the smallest results. For example, "statistically=0.0004" and "improbable=0.0003" would give a very statistically improbable 0.00000012, and yet, this posting uses that phrase more than once. It's a pretty good indicator of a work's overall topics and themes. --
In Section
Seekers of Perl Wisdom
|
|