Whilst I'm not aware of any real research I guess this is closely related to some work that I've been doing.

 After reading about bayasian filters for a while, and being impressed that they work so well I started thinking of other problems that could be solved statistically.

 One thing that I often do is misspell particular words, which don't get caught because I'll use the wrong word - like using "they" instead of "this". (Amazing how often I do that).

 Another class of errors is the holding of the shift key for too long. This resulted in the previous sentence starting "ANother...", and results in frequent uses of "THe", "LIvejournal", etc.

 The first problem I've not solved, but the second can be detected and corrected if you look at frequency analysis of letter pairs.

 I've written code that sums up the changes of a given letter being followed by another given letter. So for example the chance of "q" being followed by "u" is 95%. The chance of "T" to be followed, legitimately, by "H" is 7%.

 With a big enough sample I can flag errors with 98% accuracy - without using a dictionary.

 Maybe this is a cool use for perl?

Steve
---
steve.org.uk

In reply to Re: OT (for now): Mis-spelling research by skx
in thread OT (for now): Mis-spelling research by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.