in reply to Howto Avoid Memory Problem in List::MoreUtils

The algorithm used by List::MoreUtils::uniq internally is fast (O(N)) but uses a lot of memory.

You can use an alternative algorithm as sort-and-remove-consecutive-duplicates that is slower (O(NlogN)) but that doesn't require additional memory:

@array = sort @array; # this is optimized by perl as a sort # in place operation that doesn't # require additional memory my $last = $array[0]; my $i = 1; while($i<@array) { if ($array[$i] eq $last) { splice @array, $i, 1 } else { $last = $array[$i]; $i++; } }

Replies are listed 'Best First'.
Re^2: Howto Avoid Memory Problem in List::MoreUtils
by salva (Canon) on May 04, 2006 at 09:48 UTC
    well, the algorithm on my previous post is actually O(N^2) on the worst case because of the splice.

    A better version is:

    @array = sort @array; my $j = 0; my $last; for my $e (@array) { if (!$j or $last ne $e) { $array[$j] = $last = $e; $j++; } } $#array = $j-1;
      Hi salva,
      Thanks so much for your reply. Just a minor question. What's the meaning of this line:
      $array[$j] = $last = $e;
      Why not simply:
      $array[$j] = $e;
        $last is used to remove dups. $last needs to be set to the $e of this pass to compare it to the $e of the next pass. I suppose you could get rid of $last and do
        @array = sort @array; my $j = -1; for my $e (@array) { if ($j < 0 || $array[$j] ne $e) { $array[++$j] = $e; } } $#array = $j;
Re^2: Howto Avoid Memory Problem in List::MoreUtils
by BrowserUk (Patriarch) on May 04, 2006 at 10:31 UTC
    this is optimized by perl as a sort in place operation

    In which version of Perl did that become true?

    Unless I'm doing it wrong, it doesn't seem to be true in 5.8.8?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      n which version of Perl did that become true?
      Since 5.8.4.
      $ perl583 -le '@a=(1,2); $r1 = \$a[0]; @a=sort@a; $r2 = \$a[0];print " +yes"if $r1 == $r2' $ perl584 -le '@a=(1,2); $r1 = \$a[0]; @a=sort@a; $r2 = \$a[0];print " +yes"if $r1 == $r2' yes $

      Dave.

      well, you are right, I forgot that mergesort needs additional memory to perform the merge, but yet @a = sort @a uses half the memory required by other sorting expressions, for instance:
      use Memchmark 'cmpthese'; # use sort '_quicksort'; cmpthese( none => sub { my @a = map { int rand 100 } 1..1000000; }, sort0 => sub { my @a = map { int rand 100 } 1..1000000; @a = + sort { $a <=> $b } @a }, sort1 => sub { my @a = map { int rand 100 } 1..1000000; @a = + sort { $a <=> $b } @a, 1 } ); __END__ test: none, memory used: 3997696 bytes test: sort0, memory used: 7999488 bytes test: sort1, memory used: 12390400 bytes
      And if the quicksort algorithm is selected, then the sentence on my previous post becomes really true:
      use sort '_quicksort'; ... __END__ test: none, memory used: 3997696 bytes test: sort0, memory used: 3997696 bytes test: sort1, memory used: 12390400 bytes

        Hmm. I guess we have different understandings of the term In-place sort.

        I was trying to sort (using the default algorithm), 20e6 elements which takes 400 MB, but it was consuming 1.4 GB gross (some memory has been return to the system by the time the second memcheck runs below).

        C:\test>junk Mem after building array: 404,556 kb Mem after sorting array: 1,214,032 kb

        Having read your post, I tried the _quickersort option. Things improve marginally, but:

        C:\test>junk Mem after building array: 404,812 kb Mem after sorting array: 973,584 kb

        But that still requires a gross memory usage of 1.3 GB. 2N additional memory is somewhat greater than logN.

        A useful feature, and nice to know it's there, but not quite what I was expecting hoping for when you said in-place.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.