in reply to Questions about efficiency
It the information is record oriented, and if you don't need to see all the data before starting to process it (hint: think sorting) then the least stressful way on the system is to read a line, deal with it, read a line, deal with it.
This has the smallest memory footprint.
If you need to see all of the data before processing any of it (e.g., comparing a value to the arithmetic mean) then try and save only what you need. Rather than checking if a record meets your requirements and then saving it out, reformat it so that you only save what you need, in the format most efficient for subsequent treatments (hint: think epoch seconds)
If you have little data then you may be able to get away with stashing it in a hash, but sooner or later you will hit a big file, you will eat all your swap and your system will die a horrible lingering death.
You are better off writing it to another file, and then rereading it in again. On a lightly loaded machine with a modern operating system, most of the file will remain floating around in RAM anyway, so it won't be all that slow to read it back in again.
update: Code Smarter is on the same wavelength. In fact, the points made there are pretty well the same points I made in my lightning talk @ YAPC::Europe 2000.
|
|---|