in reply to statistics of a large text

Perhaps you could try modules dealing with ngrams and see which method has the better Benchmark?