in reply to Re: Junk NOT words
in thread Junk NOT words
For example, my own values for most common letter pairs in English are these:
English => ['he','th','in','er','an','ou'],
I have found that a better than 50% match is a pretty reliable indicator
Be aware that the path you are going down quickly leads to AI-complete problems in natural language processing. This is another one of those programming tasks that seems very easy until you try to do it on real data.
|
---|