in reply to Re^2: Brainstorming session: detecting plagiarism
in thread Brainstorming session: detecting plagiarism
You might also want to check out Ted Pedersen's Ngram Statistics Package, with regard to the problem of improbable word pairs. The output can be easily sorted to highlight least likely occurrences. Of course you would want to compare to a corpus (of written English, say), to get a fairly good idea of "normal" parameters.
Good luck, and keep us posted, please!
|
---|