Re: Questions about efficiency

It the information is record oriented, and if you don't need to see all the data before starting to process it (hint: think sorting) then the least stressful way on the system is to read a line, deal with it, read a line, deal with it.

This has the smallest memory footprint.

If you need to see all of the data before processing any of it (e.g., comparing a value to the arithmetic mean) then try and save only what you need. Rather than checking if a record meets your requirements and then saving it out, reformat it so that you only save what you need, in the format most efficient for subsequent treatments (hint: think epoch seconds)

If you have little data then you may be able to get away with stashing it in a hash, but sooner or later you will hit a big file, you will eat all your swap and your system will die a horrible lingering death.

You are better off writing it to another file, and then rereading it in again. On a lightly loaded machine with a modern operating system, most of the file will remain floating around in RAM anyway, so it won't be all that slow to read it back in again.

update: Code Smarter is on the same wavelength. In fact, the points made there are pretty well the same points I made in my lightning talk @ YAPC::Europe 2000.

Comment on Re: Questions about efficiency