IMHO, the problem is not the input being sorted but all the entries
being unique and causing the hash to grow too much and eating all the memory. On common text files, most words are repetitions of already found words and so, they don't make the hash grow.
There are several ways to solve that problem, for instance, you can try using an
on disk tree with DB_File.
Another way is to flush all the words found to temporal files on
disk everytime their number goes over some limit, and at the end, perform a
merge sort and eliminate duplicates.