hemantj has asked for the wisdom of the Perl Monks concerning the following question:

Its been observed that freeing large hashes by doing an undef on Linux is taking quite a lot of time, more than 10 times the time it took to load the hashes. Is there something I am missing?
I use Perl5.005 on Red Hat Linux. The output of perl -V is Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
Platform:
osname=linux, osvers=2.4.4smp, archname=i686-linux
uname='linux mglnxcs02 2.4.4smp #14 smp thu may 17 09:57:43 edt 2001 i686 unknown '
hint=recommended, useposix=true, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef
Compiler:
cc='/usr/bin/gcc -Dbool=char -DHAS_BOOL', optimize='-O2', gccversion=2.96 20000731 (Red Hat Linux 7.1 2.96-87)

Replies are listed 'Best First'.
Re: freeing hashes on Linux
by Abigail-II (Bishop) on May 22, 2003 at 15:13 UTC
    That's due to the memory policies of Perl. You undef a large hash, which means that for each value stored, Perl has to decrement the refcount of the value, check to see if it's zero, and if it's zero, free its memory (and possibly decrement other ref counters). Finally, it needs to free up the memory structure of the hash itself, which will be lots of linked lists, so there are lots of little pieces of memory to be freed.

    Abigail

      Hi Abigail! How about clearing all key/value pairs with:
      %myhash = ();
      How does this work internally?

      Thanks, Michele.
        It works the same way. The killer is freeing up all the bucket entries, and when you undef the hash they get freed just the same as when the hash goes out of scope. There's rather a lot of small memory malloc'ing going on (with corresponding extra overhead in memory footprint that's generally unaccounted for) and when those get freed it triggers some pathological behaviour in some versions of glibc's memory system.
Re: freeing hashes on Linux
by jaa (Friar) on May 22, 2003 at 15:32 UTC
    We had the same problem - traced to a buggy version of GCC - our production version is buggy gcc 2.95.4

    The solution was to

    1) Use a GLOBAL scope for the offending large hash - I know it runs against the grain, but you DONT want it to go out of scope or your process will sit around for ages running buggy malloc() reshuffles.

    2) Use POSIX::_exit($exitval); instead of exit() - this exits immediately without bothering to do the last garbage collection

    For us, with hashes of 200MB - 1.2GB this reduces task times from 35 minutes to 5 minutes!

    Regards

    Jeff

      Well, Perl does have it's own malloc, which you can turn on if you compile Perl with the -Dusemymalloc.

      Abigail

        We tried Perls malloc - even worse problems than the GCC 2.95 malloc, which runs slow, but at least runs reliably to completion.

        We also tried upgrading GCC but then our MySQL connections would fail after several 10s of MB interaction with the MySQL server.

        regards,

        Jeff

Re: freeing hashes on Linux
by hardburn (Abbot) on May 22, 2003 at 15:15 UTC

    If you're only worried about the memory, don't be. If you're not using it, it'll go to swap space soon enough. perl won't actually free that space to the system until the interpreter exists, anyway.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated