Re: need to optimize my sub routine

One thing that "dprofpp" is not telling you about is total memory usage, and page faults -- which are what you get when parts of the app's storage need to be "paged out" to the system's "virtual-memory / swap" disk file. In other words, things will slow down a lot when the process needs more storage than is available in RAM, because it takes a lot of disk i/o to keep all the process data intact.

I don't know about windows, but on unix/linux/macosx, you can run "top" in a separate terminal while your process is active, and watch what happens in terms of memory consumption and page faults, in addition to overall cpu load.

Perl data structures (like your AoHoH "@data") take up a lot more space than you might expect. 120 MB of data in disk files might take up two or three (or more?) times that amount inside the perl process, because of all the overhead associated with managing scalar values and nested structures.

So figure out whether you really need all your file data to be (virtual) memory resident at the same time in that one huge AoHoH structure. Maybe only some of the data from each csv row needs to be kept, or maybe the processing can be done serially (i.e. while reading each file)?

Comment on Re: need to optimize my sub routine