in reply to Re^5: Howto Avoid Memory Problem in List::MoreUtils
in thread Howto Avoid Memory Problem in List::MoreUtils

If you need 1.4GB to sort 400MB that means that you are probably converting numbers to strings inside the sort.

Yep! That's the problem.

{ ($c, $d) = ($a,$b); $c cmp $d

I wonder if that optimisation could be worked into the internal sort?

I started to suggest that if both had the (same) IOK or NOK flags set, then use a numeric comparison, but I guess it would only take one string embeded at the wrong place for the entire sort to have to be done over.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^6: Howto Avoid Memory Problem in List::MoreUtils

Replies are listed 'Best First'.
Re^7: Howto Avoid Memory Problem in List::MoreUtils
by salva (Canon) on May 04, 2006 at 13:18 UTC
    I started to suggest that if both had the (same) IOK or NOK flags set, then use a numeric comparison

    but if you are sorting lexicograficaly, you can not compare numbers as numbers because they are not equivalent operations (i.e.: 2 < 10 but 2 gt 10).

    It shouldn't be too difficult to write a sorting sub in XS that uses a comparison callback equivalent to the Perl {($c, $d) = ($a, $b); $c cmp $d }. But converting the numbers to strings is an expensive operation, even in C.

    Also, you can apply a transformation to the numbers to make them sort lexicograficaly when compared as numbers. For instance, for positive integers with 8 or less digits (untested):

    use Sort::Key 'ukeysort_inplace'; my $digits = 8; ukeysort_inplace { my $a = int $_; length $a > $digits and die 'integer $a is too big'; $a .= ' ' x ($digits - length $a); $a =~ tr/ 0123456789/0123456789a/; hex $a } @array;

    (usortkey_inplace should not use more than 8 additional bytes per value on a 32bits machine, update: + 4 bytes for the merge phase of the mergesort)

      but if you are sorting lexicograficaly,

      True. One further question though.

      Why does using sort{ $a <=> $b } @numbers also cause the memory growth. The same appears to be true for Sort::Key::nsort(_inplace)?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        how have you created the scalars inside @numbers?

        If you have read them from a file and never used them as numbers before, comparing then numerically also forces their internal structure to grow to accomodate the NV and/or IV slot.

        Also, in not too old versions of perl, there was a bug that caused numbers to be converted to strings inside sort even when numeric comparison was used.

        Besides that, in general, perl mergesort uses the memory equivalent to two pointers per value, one to pass the value on the stack and the other for the merge. Sort::Key functions use the same plus the memory required for the key (for instance, 4 bytes for an integer key on a 32bits computer or 8 bytes for a floating point number).