in reply to Text Analysis Tools to compare Slinker and Stinker?
Two academics came up with a clever use of zip-based compression for doing this type of analysis. Their scheme, which they first developed to do automatic language detection, but which is also useful for determining authorship, is glossed over here.
Basically, they noted that if you had a chunk of text from some author who was unknown, but who was a member of a known set, and if you had sample texts from each author in the set, you could concatenate the unknown text with text from each author, looking for the concatenation that compressed best.
It's a clever approach, and easily implemented in Perl.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Text Analysis Tools to compare Slinker and Stinker?
by Anonymous Monk on Jan 22, 2003 at 05:20 UTC |