itub has asked for the wisdom of the Perl Monks concerning the following question:

I have been sorting some large arrays of numbers (up to a few million), and I noticed that the program takes a long time to "clean up" after it finishes.

Update - Perhaps I posted too soon. I've noticed that this problem doesn't happen with perl 5.8.3, but it does with 5.8.0 on one machine but not on another. Perhaps other monks can try it with different versions of perl for comparison?

Some minimal sample code:

use List::Util 'shuffle'; use Time::HiRes qw(gettimeofday tv_interval); my $N = 500_000; my $t1 = [gettimeofday]; my @a = (1 .. $N); prof("array created"); my @b = shuffle @a; prof("array shuffled"); my @c = sort {$a <=> $b} @b; prof("array sorted"); sub prof{ my $name = shift; my $t2 = [gettimeofday]; my $delta = tv_interval($t1, $t2); print "$delta\t$name\n"; $t1 = $t2; }

Which gives this output:

$ time ./t2.pl 
0.268895        array created
0.359648        array shuffled
2.382604        array sorted

real    0m22.609s
user    0m21.820s
sys     0m0.220s

Noticed that the real time is 22 s, which means that the proccess finished almost 20 s after it finished sorting (that's why I call this a "clean-up" delay for lack of a better name. I don't know how the internals work, but this problem doesn't happen when I don't sort. If I remove the sort line, I get:

0.488084        array created
0.739037        array shuffled

real    0m1.547s

Even if I increase $N to 5_000_000, the cleanup time when not sorting is reasonably fast:

2.729688        array created
4.622434        array shuffled

real    0m8.948s

Anyone has any idea about why this happens? I'm using perl 5.8.0 on linux.

Replies are listed 'Best First'.
Re: Sort taking a long time to clean up
by dave_the_m (Monsignor) on Sep 15, 2004 at 21:12 UTC
    I bet the machine with the slow cleanup is a Linux box? There's a bug in some versions of malloc that takes ages to free stuff.

    Dave.

      As you note there is/was a malloc bug that led to slow destruction of big data structures. A workaround is:

      use POSIX; # code here POSIX::_exit(0);

      This skips the normal perl clean up and leaves it to the OS.

      cheers

      tachyon

        Thanks, that's good to know. I compiled a new perl with the option "use the malloc that comes with perl" and the problem goes away.
Re: Sort taking a long time to clean up
by CountZero (Bishop) on Sep 15, 2004 at 21:15 UTC
    Can you check if your computer started swapping memory to disk when doing the sort? I assume that the sort will need additional memory to hold intermediary results and you could easily cross that magic boundary where you swap memory to disk.

    "Cleaning up" would then involve reading the data back from disk to fill the memory again with the data for other application which were temporarily swapped out of your computer's memory when the sort ran.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      I don't think it's swapping. The process takes about 70 MB and there are still at least 100 MB left. The $N=500_000 process with sort takes less memory than the $N=5_000_000 process with no sort, but the latter is faster.