in reply to Out of Memory

Without knowing why you are putting so many keys in a hash, it's hard to say what to do. (Note that it matters less how many lines there are in the file, as well as the number of different words). One obvious savings you can do is chopping off the newline - that would save you a couple of Mb.

But you might consider using a disk bound datastructure. Perhaps a database, or one of the DB files. A trie was suggested as well, but I'm not sure how much it will save. Obviously, the amount of string data is reduced, but at the cost of introducing more hashes (or arrays), which themselves come with quite a lot of memory overhead. It will depend on the prefix duplication in the data set.