Whilst I'm not aware of any real research I guess this is closely related to some work that I've been doing.
After reading about bayasian filters for a while, and being impressed that they work so well I started thinking of other problems that could be solved statistically.
One thing that I often do is misspell particular words, which don't get caught because I'll use the wrong word - like using "they" instead of "this". (Amazing how often I do that).
Another class of errors is the holding of the shift key for too long. This resulted in the previous sentence starting "ANother...", and results in frequent uses of "THe", "LIvejournal", etc.
The first problem I've not solved, but the second can be detected and corrected if you look at frequency analysis of letter pairs.
I've written code that sums up the changes of a given letter being followed by another given letter. So for example the chance of "q" being followed by "u" is 95%. The chance of "T" to be followed, legitimately, by "H" is 7%.
With a big enough sample I can flag errors with 98% accuracy - without using a dictionary.
Maybe this is a cool use for perl?
In reply to Re: OT (for now): Mis-spelling research
by skx
in thread OT (for now): Mis-spelling research
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |