in reply to Re: Challenge: Predictive Texting
in thread Challenge: Predictive Texting

Not_a_Number,
...are we allowed to sort our datastructure by frequency of paragrams..

Yes. In fact, the reason the mystery text remains secret is so this technique is not applied to just that text skewing the results.

If so, does anybody know of a freely available list of word frequencies in US English?

I am fairly certain I came across one this morning when researching but can't be sure that it was US English.

will the mystery text** consist of (a) more or less 'normal' English prose (albeit with punctation and capitalisation removed) or (b) a more or less random string of words (in which case frequency considerations will be otiose)?

More or less US English prose.

... - which means that it would be pretty difficult to construct a coherent text of any length consisting of words only to be found in the list.

You are quite correct. The 2of12inf.txt does a much better job in this area. On the other hand, if an entire book can be written without using the letter e in two different languages, I am sure that it will not be too difficult to provide mystery text between 3000 and 5000 words that meet the constraints.

Thanks once again for an interesting, thought-provoking challenge.

You're welcome.

Cheers - L~R

Replies are listed 'Best First'.
Re^3: Challenge: Predictive Texting
by rhesa (Vicar) on Jan 11, 2007 at 15:01 UTC
    On the other hand, if an entire book can be written without using the letter e in two different languages, (...)

    That would be the famous book by Georges Perec: A_Void (originally "La disparition").