I'm hoping if I can feed a module 1000 lines of Slinker and 1000 lines of Stinker and have it say something like "These two files have a Herzenberger-Foogenboogen Written English Similarity Rating of 97%"?

Two academics came up with a clever use of zip-based compression for doing this type of analysis. Their scheme, which they first developed to do automatic language detection, but which is also useful for determining authorship, is glossed over here.

Basically, they noted that if you had a chunk of text from some author who was unknown, but who was a member of a known set, and if you had sample texts from each author in the set, you could concatenate the unknown text with text from each author, looking for the concatenation that compressed best.

It's a clever approach, and easily implemented in Perl.


In reply to Re: Text Analysis Tools to compare Slinker and Stinker? by dws
in thread Text Analysis Tools to compare Slinker and Stinker? by Cody Pendant

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.