Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've written a perl script that reads in and processes potentially a large amount of data (~1e6 floats). The code works fine, but between the exit() call and when I actually get back to the shell many minutes pass. I've resorted to hitting ^Z and then killing the job as my patience grew thin during a long debugging session.

The data is stored in a slightly complicated data structure:
$data->{$experiment_id}->[$row_index]->[ @values ]
where there are 5-30 experiment ids, row index goes up to 10,000-20,000, and @values contains 4-10 scalars.

Is there any way to shorten this delay?

Replies are listed 'Best First'.
Re: Exiting takes a looong time
by Elian (Parson) on Nov 19, 2002 at 20:09 UTC
    There's a bug in some versions of glibc's memory allocation system that makes freeing lots of small data pieces take an extraordinarily long time. (2.2.5 looks to be one of the prime culprits) There's not a whole lot perl itself can do about this. What you can do is either upgrade your libc (newer versions have a new memory allocation system which seems to fix this) or build perl with perl's own malloc.

    What's happening is that perl's allocated lots of little chunks of memory. When perl exits, it needs to go through and hand them all back to the system like a well-behaved potentially embedded program should. Unfortunately the afflicted versions of libc, when presented with a zillion frees, seem to go insane managing its free list, with the performance problems you've noted. (This can also happen if you free up a lot of little chunks of memory in the middle of your program, in which case you'll get a noticeable pause)

(z) Re: Exiting takes a looong time
by zigdon (Deacon) on Nov 19, 2002 at 18:51 UTC
    I'm guessing the long exit time is because perl must free all the memory it used up in the processing of all this data. Perhaps you could either make your algorithem more memory efficient, or use a tied hash? Not sure what that would do to your performance though.

    -- Dan

Re: Exiting takes a looong time
by BrowserUk (Patriarch) on Nov 19, 2002 at 20:28 UTC

    Here strange for you. You don't mention what OS your using, though the "hitting ^Z and then killing the job" bit indicates some form of unix, but I tried a very unscientific experiment on my NT system and replicated your hash structure. It took between 12 and 15 seconds for Perl/the OS to release the memory (no swapping involved) if I used exit, but less than 4 seconds if I just allowed the script to fall off the end.

    I had the test program print the time just prior to the exit/end-of-script and set my command prompt to display the time so I could see how long the process took.

    I have no idea why calling exit would take so much longer than not, but maybe it would work for you too?


    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: Exiting takes a looong time
by pg (Canon) on Nov 19, 2002 at 19:21 UTC
    I tried to mimic what you are doing, but don't see a long delay before the exit. I don't really think the size of the data is the main reason. However, I really think you should not keep all the data in memory at each single moment (Well, literally in memory, disregarding whether the OS is swapping them out or not). Do you need all of them all the time? Even if it does not contribute to the exit problem, it would hurt the performance of your application though.
    use Data::Dumper; use strict; use constant MAX_TOUCH => 30; use constant MAX_EXP => 30; use constant MAX_ROW => 20000; my $hash; my $exp; my $row; for ($exp = 0; $exp < MAX_EXP; $exp ++) { print "exp = $exp\n"; $hash->{$exp} = {}; for ($row = 0; $row < MAX_ROW; $row ++) { my $array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]; $hash->{$exp}->{$row} = $array; } } my $test; for (0 .. MAX_TOUCH) { print "touch = $_\n"; for ($exp = 0; $exp < MAX_EXP; $exp ++) { for ($row = 0; $row < MAX_ROW; $row ++) { $hash->{$exp}->{$row}[0] ++; } } } #print Dumper $hash; print "about to exit\n"; exit;
Re: Exiting takes a looong time
by iburrell (Chaplain) on Nov 19, 2002 at 19:53 UTC
    Are you doing anything else in your script? Are you sure that the delay is happening after the exit()? The only thing I can think that might cause a delay is writing to a filehandle. Exitting closes the filehandles and tries to flush the buffer. If the filehandle blocks, then the close might hang.
      I went throught it with the debugger, and tried exiting at different points, and the delay only occurred after the data was in memory. Also, in the debugger, I run  $data = undef, and that gave me the delay too.

      And, yes, I do need that data in memory. I have to perform various manipulations and calculate statistics on the data which means I can't just process it one row at a time.

Re: Exiting takes a looong time
by rdfield (Priest) on Nov 20, 2002 at 11:53 UTC
    I'm glad you posted this. I'm running a similar app, ie reading many rows of data into a complex data structure then manipulating the data as a whole, and the problem definately is in the memory de-allocation (I'm running this app on W2K). According to the task manager the app has 534MB RAM allocated to it (all but a small proportion of it to the one, main, data structure). Trying to free the memory ($hash->{head} = ();) results in the disk light coming on and that's about it. It's currently generating 90 page faults a second, and is approaching 500K page faults in total. Not bad for a data structure that took less than 10 minutes to build.

    rdfield

Re: Exiting takes a looong time
by Anonymous Monk on Sep 17, 2003 at 00:41 UTC
    I ran into a similar problem where the build on my data structure took aboyt 2 minutes, and exiting perl (without exit) took about 23 minutes. I was forced to implement a DESTROY method for one or more objects in the script to 'manually' set each value to undef (I use array refs for objects). The DESTROY method was just a foreach loop. Using DESTROY I was able to cut the exit time to around 9 minutes, which is much better than 23, but still a bit high compared to 2.