jaa has asked for the wisdom of the Perl Monks concerning the following question:

I am experiencing very slow exits from scripts and believe that this is the result of Perls calling libc free many hundred thousand times at exit.

The new libc6 v 2.3.1 faster free makes the slow exit problems disappear. Unfortunately libc6 2.3.1 causes mysql connection failures after large memory usage, so I am stuck with 2.2.5 for the moment.

I am trying to figure out if it is possible therefore, to pre-allocate a large array of hash?

I can pre-allocate an array or a hash with

$#myarray = 250000; keys %myhash = 120;
How would I pre-allocate a 250000 array of 120 field hashes?

How many buckets should be preallocated for a 120 field hash?

TIA

Jeff

Replies are listed 'Best First'.
Re: How do I pre-allocate an array of hashes?
by Abigail-II (Bishop) on Feb 21, 2003 at 17:06 UTC
    Something like:
    $#myarray = 250000; keys %$_ = 120 for @myarray;

    Abigail

      Thanks, but I'm looking for something a bit more chunky that reduces the number of times that Perl will call malloc

      250,000 calls to

      keys %myhash = 120;
      is not much better than just preallocating the array and allowing the individual hashes to get created one at a time.

        The old C trick I used to see was to calculate the total malloc arena size needed, then malloc one huge chunk that size at startup and immediately free it. The idea was that the smaller mallocs would then use that existing huge malloc arena. Since the actual process memory, gotten (on UNIX) via sbrk is one way only -- you can't give it back -- malloc maintains free space in its own internal arena. This approach supposedly then traded off all those expensive small system calls with one up front.

        In practice, I saw mixed results with this. And I'm sure malloc algorithms have changed so it may mean absolutely nothing today even if you can figure out how to translate that to Perl. Maybe it would be better to use an alternate malloc package when building Perl?

        Well, Perl isn't C. Maybe you should consider using a different datastructure.

        Abigail

Re: How do I pre-allocate an array of hashes?
by Elian (Parson) on Feb 21, 2003 at 18:33 UTC
    The short answer is you can't. There's nothing you can do at the user level to alter the way perl allocates memory for the base hash structures, and so you're stuck with the performance characteristics as long as you use the busted version of glibc.

    The alternatives are either rebuild perl with perl's own malloc rather than the system one (as it doesn't have these issues) or find another malloc to link against rather than the one in glibc. (Which is somewhat problematic, but doable)

      Thanks - other research bears this out too - will try rebuilding the debian stable Perl with -Dusemymalloc

      Incidentally, as an example, one script takes 14 minutes to decrypt, translate and save thousands of records into the database, and then sits around for 22 minutes freeing the large hashes before exiting.

      Jeff

        Yeah, this is due to the insane number of individual allocations that perl does when building hashes. It's something I'm looking into fixing for perl 6.
Re: How do I pre-allocate an array of hashes?
by hv (Prior) on Feb 21, 2003 at 18:49 UTC

    The reason perl is going round trying to free everything up is primarily to ensure that any remaining objects with DESTROY methods get them called appropriately.

    I wouldn't particularly recommend this approach, but if you are confident that you don't have any objects needing DESTROY at exit, you can avoid the cleanup by calling instead POSIX::_exit at the end of the program.

    Hugo
      I wondered if we should blame Perl garbage collection too - but our tests with replacing libc6 2.2.5 with libc6 2.3.1 show that Perl is not to blame - the problem is in the memory management in libc6

      Thanks for the pointer to the POSIX::_exit - but this is only useful if the problem happens at the end of processing, in fact we have the 'slow free' problem both at the end, and during processing.

      Jeff