in reply to Challenge: Predictive Texting
Right, I think I understand the rules now. :-)
A subsidiary question: are we allowed to sort our datastructure by frequency of paragrams (or 'textonyms' as I learn that they are also called from the wikipedia page you link to)?
If so, does anybody know of a freely available list of word frequencies in US English*? (A good UK English resource is this site, which uses the British National Corpus).
Come to think of it, the answer to this probably depends on yet another subsidiary question that I have: will the mystery text** consist of (a) more or less 'normal' English prose (albeit with punctation and capitalisation removed) or (b) a more or less random string of words (in which case frequency considerations will be otiose)?
Looking through 2of12.txt, I see that it is extremely poor in inflected forms (plurals, past tenses...) - even 'lips', which you use in several a couple of your examples above, is not included - which means that it would be pretty difficult to construct a coherent text of any length consisting of words only to be found in the list.
* Note that 2of12.txt contains few or no UK English variant spellings (no 'colour', 'criticise', 'manoeuvre'...).
** BTW, how should we parse 'between 3 and 5 thousand': 'between 3 and 5000', or 'between 3000 and 5000'? </nitpick>
Update: PS, I forgot to add this. Thanks once again for an interesting, thought-provoking challenge. Limbic~Region++!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Challenge: Predictive Texting
by Limbic~Region (Chancellor) on Jan 10, 2007 at 19:55 UTC | |
by rhesa (Vicar) on Jan 11, 2007 at 15:01 UTC | |
by Limbic~Region (Chancellor) on Jan 11, 2007 at 15:51 UTC |