Re: hashref population yields out of memory error

You probably need to re-think your entire algorithm.

While it is very tempting to “stuff it all into a hashref and get it back with random-access,” this is not a good approach to take when faced with very large amounts of data.

“Memory,” after all, is virtual, and therefore a disk-file. As you seek through it randomly, page-faults occur and the system can slow down precipitously.

A much better approach when faced with large amounts of data is to employ a disk-based sort. Yes, I am talking about sequential files! When two files are being compared and you know that both of those files are identically sorted, the process becomes very fast and quick. Furthermore, sorting is one of those algorithms that is “unexpectedly fast and efficient,” so that run-times can be markedly less ... two sorts and all ... than you might ever imagine. (Think in terms of runtimes dropping from “several hours” to “minutes,” or maybe even seconds.)

This is how data was processed, using punched cards, long before digital computers were invented. It's what they were doing with their computers in all those sci-fi movies from the 1960's, with all those tapes spinning merrily along and ... you may have noticed ... never going backwards. (The technique they used while filming was called a “tape sort” or “polyphase sort,” and it still works.)