Qiang has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am gathering data to a hash and encounter the perl memory freeup issue (memory never return to the system unless I reboot it). I thought that I would use module storable and break down the data collection to steps, then store each collected data at step into local disk to minimize the issue. but that doesn't seems to get any better.

use File::Find; use Storable qw(store retrieve); my $mbox = {} foreach my $usr (@usr_dirs) { chomp $usr; my $usr_dir = $base_dir."/".$usr; undef $mbox; $mbox = retrieve($storable_file) if -e $storable_file; find(\&wanted,$usr_dir); store $mbox, $storable_file or die "Can't store !\n"; }

the find subroutine does all the data collection. before coming up with the above code, I stored all data into $mbox hash once then saved it to the local disk, the memory usage is really high. I was hoping by breaking it down by storing / retriving / storing , the memory usage issue could be resolved.

also, If i am not mistaken, once $mbox gets undef, the memory was allocated by it goes back to the memory pool. so shouldn't the overall maximum memory usage = max(batch_1_mem_usage,batch_2_mem_usage, batch_n_mem_usage) ?

Replies are listed 'Best First'.
Re: hash and memory usage
by zentara (Cardinal) on Jan 08, 2005 at 13:19 UTC
    also, If i am not mistaken, once $mbox gets undef, the memory was allocated by it goes back to the memory pool.

    That is a common mistake, and I usually find a way around it, although I don't fully understand it myself.

    The memory is returned when there is no further references to the hash, anywhere in your code. There are also other complications, where Perl will release the memory for reuse, but it won't release it to the system until the program ends. So if you run a small test script filling the hash up to 30 megs, then undef the hash, the Perl script will stay at 30 Megs, but will not increase any further if you reload a second 30 meg hash, it will reuse it's internal allocations.

    But it sounds more like you have "hidden internal references" to the hash, probably buried down in one of the modules. It sounds like a good plan, to load a small hash, and save it to a database, the reuse the hash for the next batch; but you have to be very careful about deleting the hash. Do a Super Search for "auto-vivication", you may be getting bit by that. It's possible that somehow, one of the modules is keeping an empty hash element around. Even though it is empty, it is a reference to the old hash, and prevents the memory release.

    You have to carefully plan for memory reuse, it won't happen magically by undef'ing things, like it would in C or C++.


    I'm not really a human, but I play one on earth. flash japh
Re: hash and memory usage
by osunderdog (Deacon) on Jan 07, 2005 at 23:24 UTC
      I read it through and It appears that seperating the memory consuming part into another script then exec it is one solution. but I don't see the memory gets released by doing it.

      I am using Parallel::ForkManager in the file1.pl, btw.

      file1.pl

      use Parallel::ForkManager; my $pm = new Parallel::ForkManager(4); foreach my $base_dir (@base_dirs) { my $storable_file = $storable_dir.$f; $pm->start and next; # do the fork `/home/qiang/file2.pl $storable_file $base_dir`; $pm->finish; } $pm->wait_all_children;

      file2.pl

      use File::Find; use Storable qw(store retrieve); my $storable_file = shift; my $base_dir = shift; my @usr_dirs = map {chomp;$base_dir."/".$_} `/bin/ls $base_dir`; my $mbox = {}; find(\&wanted,@usr_dirs); store $mbox, $storable_file or die "Can't store !\n";

        How are you determining that memory is being used?

        Also, looking at your file1.pl, I think you probably want an exec here rather than using the back tick.

        exec('/home/qiang/file2.pl', $storable_file, $base_dir);
        Because, from the docs, backticks does a fork and exec. So I would think you're probably doing fork from the ForkManager, Fork from backticks, then an exec from backticks. If that makes any sense.


        "Look, Shiny Things!" is not a better business strategy than compatibility and reuse.