Rather than trying to force the release memory back to the OS (which does happen naturally under the right circumstances as I demonstrated here and here) it would be better to avoid using it in the first place. Sorting 20,000,000 elements is going to take huge amounts of addition memory over and above that used to just store the data, especially when using the ST.

As you only require the top N values, you can sort the data in chunks, adding the current top N to the chunk each time and then extracting the new top N. In the end you have your required result, but any given sort is operating on a much smaller set, and the memory consumed sorting that set is available for re-use in the next sort. Memory growth is minimised.

Finding the balance between memory consumption and speed requires a little experimentation, but it would appear that there is little performance gain to be had by using 100 sorts of 10,000+N elements over 1,000 sorts of 1,000+N, and the latter uses far less memory.

#! perl -slw use strict; use vars qw[ $SIZE $N $CHUNK ]; $SIZE ||= 100_000; $N ||= 30; $CHUNK ||= 100; srand( 1 ); sub getTopN { my( $aref, $n ) = @_; my @top = @{ $aref }[ 0 .. ( $n - 1) ]; for( my $p = $CHUNK; $p < @$aref; $p += $CHUNK ) { my $end = $p + $CHUNK < @$aref ? $p + $CHUNK : $#{ $aref }; @top = ( sort{ $b <=> $a } @top, @{ $aref }[ $p .. $end ] )[ 0 .. ( $n - 1 ) ]; } return @top; } print scalar localtime; my @numbers; $#numbers = $SIZE; $numbers[ $_ ] = int rand $SIZE for 0 .. $SIZE; print scalar localtime; print "@{[ getTopN( \@numbers, $N ) ]}"; print scalar localtime;

Collating some results for different size arrays and chunk sizes, it shows that using this method, the sort times scale fairly linearly with array size and using a chunk size of 1000 gives about the best results, though you might want to investigate chunk sizes between the 1000 and 10,000 I tested.

Array size | gen.(s) | sort(s)| sort(s) | sort(s) | sort(s) | | (30) | (100) | (1000) | (10000) -----------+---------+--------+---------+---------+------- 100_000 | 1 | 4 | 3 | 2 | 2 1_000_000 | 12 | 55 | 38 | 33 | 36 2_000_000 | 29 | 138 | 102 | 88 | 93

By way of comparison, sorting the 1_000_000 numbers using a straight numeric sort takes 154 seconds which compares very unfavourably with the 33 seconds it takes to sort them in 1,000 passes of 1,030 elements at a time.

Some (truncated) raw results

P:\test>294065 -SIZE=100000 -N=30 -CHUNK=30 Thu Sep 25 10:42:39 2003 Thu Sep 25 10:42:40 2003 99996 99996 99996 99996 99996 99996 99993 99993 99993 99990 99990 9999 +0 99990 99987 99987 99987 Thu Sep 25 10:42:44 2003 P:\test>294065 -SIZE=100000 -N=30 -CHUNK=100 Thu Sep 25 10:42:55 2003 Thu Sep 25 10:42:56 2003 99996 99996 99996 99996 99996 99996 99993 99993 99993 99990 99990 9999 +0 99990 99987 99987 99987 Thu Sep 25 10:42:59 2003 P:\test>294065 -SIZE=100000 -N=30 -CHUNK=1000 Thu Sep 25 10:43:14 2003 Thu Sep 25 10:43:15 2003 99996 99996 99996 99996 99996 99996 99993 99993 99993 99990 99990 9999 +0 99990 99987 99987 99987 Thu Sep 25 10:43:17 2003 P:\test>294065 -SIZE=100000 -N=30 -CHUNK=10000 Thu Sep 25 10:43:29 2003 Thu Sep 25 10:43:30 2003 99996 99996 99996 99996 99996 99996 99993 99993 99993 99990 99990 9999 +0 99990 99987 99987 99987 Thu Sep 25 10:43:32 2003 P:\test>294065 -SIZE=1000000 -N=30 -CHUNK=30 Thu Sep 25 10:43:48 2003 Thu Sep 25 10:44:00 2003 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 999969 999969 99996 Thu Sep 25 10:44:55 2003 P:\test>294065 -SIZE=1000000 -N=30 -CHUNK=100 Thu Sep 25 10:45:13 2003 Thu Sep 25 10:45:25 2003 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 999969 999969 99996 Thu Sep 25 10:46:03 2003 P:\test>294065 -SIZE=1000000 -N=30 -CHUNK=1000 Thu Sep 25 10:46:15 2003 Thu Sep 25 10:46:26 2003 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 999969 999969 99996 Thu Sep 25 10:46:59 2003 P:\test>294065 -SIZE=1000000 -N=30 -CHUNK=10000 Thu Sep 25 10:47:20 2003 Thu Sep 25 10:47:31 2003 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 999969 999969 99996 Thu Sep 25 10:48:07 2003 P:\test>294065 -SIZE=2000000 -N=30 -CHUNK=30 Thu Sep 25 10:48:43 2003 Thu Sep 25 10:49:12 2003 1999938 1999938 1999938 1999938 1999938 1999938 1999938 1999938 199993 +8 1999938 1999938 1999938 Thu Sep 25 10:51:30 2003 P:\test>294065 -SIZE=2000000 -N=30 -CHUNK=100 Thu Sep 25 11:08:59 2003 Thu Sep 25 11:09:29 2003 1999938 1999938 1999938 1999938 1999938 1999938 1999938 1999938 199993 +8 1999938 1999938 1999938 1999938 1999938 1999938 1999938 1999938 Thu Sep 25 11:11:11 2003P:\test>294065 -SIZE=2000000 -N=30 -CHUNK=1000 Thu Sep 25 10:52:14 2003 Thu Sep 25 10:52:42 2003 1999938 1999938 1999938 1999938 1999938 1999938 1999938 1999938 199993 +8 1999938 1999938 1999938 Thu Sep 25 10:54:10 2003 P:\test>294065 -SIZE=2000000 -N=30 -CHUNK=10000 Thu Sep 25 10:55:13 2003 Thu Sep 25 10:55:41 2003 1999938 1999938 1999938 1999938 1999938 1999938 1999938 1999938 199993 +8 1999938 1999938 1999938 Thu Sep 25 10:57:14 2003

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.


In reply to Re: Force perl to release memory...(use less and sort faster too!) by BrowserUk
in thread Force perl to release memory back to the operating system by Roger

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.