woland99 has asked for the wisdom of the Perl Monks concerning the following question:

Howdy - I have a problem with creeping memory usage on a script I wrote. I apologize that problem description is mostly prose and no code but script is 6000 lines long and would not make much sense to put it here. But I would really appreciate some general guidance on how to write such scripts without hitting memory wall.

I wrote a script to provide extracts from a set of data files containing country/date for some products. Each product has four type related characteristics: product-line, product-name, feature. And for each combination of the three characteristic there is a countryDate string that contains information about dates and countries where prodLine|prodName|Feature can be ordered.

My task is for given set of criteria involving set of countries, dates and some additional logic that is prodLine dependent give an extract - for example find all prodName|Features from a given product line(s) that was "available" in Germany between May 1st and Oct 15th 2012. The concept of "available" usually can be just established by extracting dates related to a given country from countryDate string and comparing them with some input arguments (actual implementation is complex - there are many sub-cases that you need to check - depending on product line and type of availability - but those are details - it is just encapsulated in some functions checking applicable filtering criteria).

There are about 100,000 items in the set of files ie 100,000 separate prodLine|prodName|Feature items. My script starts from reading all the data into a hash indexed by prodLine, prodName, Feature. I was careful not to store actual countryDate values in the hash since set of values for those is much smaller than 100,000 (possibly of order of 8000 distinct countryDate values).

Main part of the program loops over prodLine, prodName and Feature and then set of countries to extract and then checks filtering criteria for availability - each check usually involves 3-4 calls to date comparison routines from Date::Manip module. So as far as function calls script makes (on typical run) close to 6-8x10^5 calls to functions in that module. I can probably cut it drastically by caching results of calls in a hash since there are about 4 input dates and maybe 1000 distinct dates in in countryDate strings so by caching it I could cut number of function calls to 4000.

But this is the part when memory climbs steadily - on Win7 memory taken by perl process climbs from about 120MB tp close to 300MB (even if I check just two countries - if I try ALL countries report machine chokes after Perl taking 3GB of memory). I reset all loop specific auxiliary arrays and and hashes between appropriate passes through loops.

I realize that I do not give you a lot of specifics here but I would appreciate if you could share any remarks or general ideas how to loop through large set of data efficiently and track memory usage - leaks. Thanks

  • Comment on Memory leakage? How to avoid/detect them and profile memory usage?

Replies are listed 'Best First'.
Re: Memory leakage? How to avoid/detect them and profile memory usage?
by Athanasius (Archbishop) on Aug 16, 2013 at 02:51 UTC
Re: Memory leakage? How to avoid/detect them and profile memory usage?
by Anonymous Monk on Aug 16, 2013 at 02:42 UTC