Re: memory consumption skyhigh...

Your data set could include up to 21,952,000,000,000 points if fully populated. Even though you are working with a sparsely populated version of that possible set, the number is still going to be huge. If we assume that each word will only form digraphs with 1/5 of available words, that leaves you with 175,616,000,000 data points to count. You are suffering from a combinatorical explosion.

Zen recommended using a database. I think that is good advice. MLDBM looks like a nice fit.

Update: Calculations based on 28000 words, not 27000. Credited Zen by name for his advice.

TGI says moo

Comment on Re: memory consumption skyhigh...