in reply to Re^9: In-place sort with order assignment
in thread In-place sort with order assignment

I have no idea how much additional memory Heap::Simple::XS uses under the covers,

For 1e6 items, the memory usage grows from 145MB to over 200MB, which for 10e6 items is going to push a 32-bit machine into swapping.

That said, I think this memory usage may, in part at least, be due to a bug in this incarnation of the code.

I cannot see what would prevent this loop copying everything from %hash into both %known and the heap?

while (my ($key, $val) = each %hash) { next if defined $val; if (exists $known{$key}) { $hash{$key} = $known{$key}; next; } $heap->insert($key); }

Overall, the approach used in the second snippet in Re^2: In-place sort with order assignment seems to be the best. It takes 8 seconds and very little extra memory for 1e6; versus 50 seconds and +25% for the heap. And it happily handles 10e6 in 108 seconds and under 2GB.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^11: In-place sort with order assignment
by Limbic~Region (Chancellor) on Sep 20, 2010 at 13:18 UTC
    BrowserUk,
    I cannot see what would prevent this loop copying everything from %hash into both %known and the heap?

    next if defined $val; will skip any keys from %hash that we have previously assigned a value to.

    $hash{$key} = $known{$key};next; will assign any values we learned from the last run and then move on to the next record.

    $heap->insert($key); will only insert records into the heap for keys that we have not assigned a value to (either in a previous run or this run). Update: According to the documentation, max_count => $at_once will throw out items from the heap beyond that point. If that doesn't work as advertised, that may be the source of the additional memory.

    Cheers - L~R

      I was thinking that on the first pass, no values would be set, therefore everything would end up in the heap. Whilst everything gets added, I was unaware that things were discarded beyond the specified maximum.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.