in reply to Running out of resources while data munging
I'm not answering your question in this node (others have already done so), just offering a tip I find useful in that situation:
I frequently rip through huge files to collect statistics, spread among multiple outputs, etc. What I often find useful is to run the data through sort first. If the task suggests a good sort key, this can often remove the need to collect all the information into a hash first. It doesn't usually cost much time, as when I rip through the sorted file, caching lets me use the file image already in RAM.
...roboticus
|
|---|