Re^9: In-place sort with order assignment

BrowserUk,
I have no idea how much additional memory Heap::Simple::XS uses under the covers, but the speed is dramatically faster than using splice with a binary search (above).

#!/usr/bin/perl
use strict;
use warnings;
use Heap::Simple::XS;
use Time::HiRes qw/gettimeofday tv_interval/;

my $items = $ARGV[0] || 100;
my $str = 'a';
my %hash = map {$str++ => undef} 1 .. $items;
my $at_once = int($items * .10);

my $heap = Heap::Simple::XS->new(order => "gt", elements => "Scalar", 
+max_count => $at_once);
my ($cnt, $beg, %known) = ($at_once, [gettimeofday], ());
while (1) {
    while (my ($key, $val) = each %hash) {
        next if defined $val;
        if (exists $known{$key}) {
            $hash{$key} = $known{$key};
            next;
        }
        $heap->insert($key);
    }
    my $items = $heap->count;
    last if ! $items;
    %known = ();
    my $max = $cnt + $items;
    $known{$_} = $cnt-- for $heap->extract_all;
    $cnt = $max;
    $heap->clear;
}
my $elapsed = tv_interval($beg, [gettimeofday]);
my $per = sprintf("%.7f", $elapsed / $items);
print "Took $elapsed seconds for $items items ($per per item)\n";
__DATA__
C:\tmp>perl buk2.pl 100
Took 0.001999 seconds for 100 items (0.0000200 per item)

C:\tmp>perl buk2.pl 1000
Took 0.021015 seconds for 1000 items (0.0000210 per item)

C:\tmp>perl buk2.pl 10000
Took 0.241327 seconds for 10000 items (0.0000241 per item)

C:\tmp>perl buk2.pl 100000
Took 3.375 seconds for 100000 items (0.0000338 per item)

C:\tmp>perl buk2.pl 1000000
Took 48.25 seconds for 1000000 items (0.0000483 per item)
[download]

Cheers - L~R

Comment on Re^9: In-place sort with order assignment Download Code

Replies are listed 'Best First'.
Re^10: In-place sort with order assignment by BrowserUk (Patriarch) on Sep 20, 2010 at 07:40 UTC
I have no idea how much additional memory Heap::Simple::XS uses under the covers, For 1e6 items, the memory usage grows from 145MB to over 200MB, which for 10e6 items is going to push a 32-bit machine into swapping. That said, I think this memory usage may, in part at least, be due to a bug in this incarnation of the code. I cannot see what would prevent this loop copying everything from `%hash` into both `%known` and the heap? `while (my ($key, $val) = each %hash) { next if defined $val; if (exists $known{$key}) { $hash{$key} = $known{$key}; next; } $heap->insert($key); }` [download] Overall, the approach used in the second snippet in Re^2: In-place sort with order assignment seems to be the best. It takes 8 seconds and very little extra memory for 1e6; versus 50 seconds and +25% for the heap. And it happily handles 10e6 in 108 seconds and under 2GB. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]
Re^11: In-place sort with order assignment by Limbic~Region (Chancellor) on Sep 20, 2010 at 13:18 UTC
BrowserUk, I cannot see what would prevent this loop copying everything from %hash into both %known and the heap? `next if defined $val;` will skip any keys from %hash that we have previously assigned a value to. `$hash{$key} = $known{$key};next;` will assign any values we learned from the last run and then move on to the next record. `$heap->insert($key);` will only insert records into the heap for keys that we have not assigned a value to (either in a previous run or this run). Update: According to the documentation, `max_count => $at_once` will throw out items from the heap beyond that point. If that doesn't work as advertised, that may be the source of the additional memory. Cheers - L~R	[reply] [d/l] [select]
Re^12: In-place sort with order assignment by BrowserUk (Patriarch) on Sep 20, 2010 at 13:39 UTC
I was thinking that on the first pass, no values would be set, therefore everything would end up in the heap. Whilst everything gets added, I was unaware that things were discarded beyond the specified maximum. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply]