in reply to *Fastest* way to print a hash sorted by value
Using a binary chop to insert elements into a array of the highest N elements seems to work quite well, taking around 1/6 th time to find the highest 100 elements from 1_000_000 despite being pure perl.
The saving at 500_000 is around 2 /3rds.
P:\test>282315 -N=100 -MAX=500000 -TRIALS=1 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 +499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 +499984 1 trial of insertion sort (8.002s total) 1 trial of standard sort & slice (26.198s total)
The saving at 1_000_000 is around 5 /6ths.
P:\test>282315 -N=100 -MAX=1000000 -TRIALS=1 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 9 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 9 1 trial of insertion sort (15.733s total) 1 trial of standard sort & slice (90.730s total)
The saving at 10_000_000 will be ?? 99/100 ths ??? I don't have enough memory to try this!
#! perl -slw use strict; use Benchmark::Timer; my $T = Benchmark::Timer->new; use vars qw[ $N $MAX $TRIALS ]; $N ||= 10; $MAX ||= 100; $TRIALS ||= 10; srand(1); my @unsorted = map{ int rand $MAX } 1 .. $MAX; my @topN1; for( 1 .. $TRIALS ) { @topN1 = (); $T->start( 'insertion sort' ); iSortN( \@topN1, $N, \@unsorted ); $T->stop( 'insertion sort' ); } print "@{[ reverse @topN1 ]}"; my @topN2; for( 1 .. $TRIALS ) { $T->start( 'standard sort & slice' ); @topN2 = ( sort{ $b <=> $a } @unsorted )[ 0 .. ($N-1) ]; $T->stop( 'standard sort & slice' ); } print "@{[ @topN2 ]}"; $T->report; exit; use constant { AREF=>0, LISTSIZE=>1, INPUT=>2 }; sub iSortN { # AREF = Array ref to hold the list. # LISTSIZE = then number of elements to retain. # Either a single value or an array ref to a batch of values # One peculiarity of the implementation is that the list comes back in + reverse order. # Fixing this is an exercise for the user:) my $input = ref $_[INPUT] ne 'ARRAY' ? [ $_[INPUT] ] : $_[INPUT]; push( @{ $_[AREF] }, 0 ) unless @{ $_[AREF] }; for ( @{ $input } ) { if( $_ <= $_[AREF]->[ 0 ] ) { unshift @{ $_[AREF] }, $_; } elsif( $_ >= $_[AREF]->[ -1 ] ) { push @{ $_[AREF] }, $_; } else { splice @{ $_[AREF] }, bsearch( $_[AREF], $_ ), 0, $_; } shift @{ $_[AREF] } if @{ $_[AREF] } > $_[LISTSIZE]; } } sub bsearch { my( $aref, $value ) = @_; my( $lo, $hi, $mid ) = ( -1, $#{ $aref }, 0 ); while( $lo <= $hi ) { $mid = int( ($lo + $hi) / 2 ); if( $value > $aref->[ $mid ] ) { $lo = ++$mid; } elsif( $value < $aref->[ $mid ] ) { $hi = --$mid; } else { return $mid } } return $aref->[ $mid ] > $value ? $mid : $mid+1; } __END__ P:\test>282315 -N=10 -MAX=10000 9999 9998 9997 9997 9996 9996 9994 9992 9990 9990 9999 9998 9997 9997 9996 9996 9994 9992 9990 9990 10 trials of insertion sort (1.612s total), 161.232ms/trial 10 trials of standard sort & slice (1.031s total), 103.148ms/trial P:\test>282315 -N=10 -MAX=100000 99996 99996 99996 99996 99996 99996 99993 99993 99993 99990 99996 99996 99996 99996 99996 99996 99993 99993 99993 99990 10 trials of insertion sort (16.424s total), 1.642s/trial 10 trials of standard sort & slice (17.645s total), 1.765s/trial P:\test>282315 -N=10 -MAX=200000 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 199993 10 trials of insertion sort (31.996s total), 3.200s/trial 10 trials of standard sort & slice (37.694s total), 3.769s/trial P:\test>282315 -N=10 -MAX=300000 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 299990 10 trials of insertion sort (47.869s total), 4.787s/trial 10 trials of standard sort & slice (62.895s total), 6.289s/trial P:\test>282315 -N=10 -MAX=500000 P:\test>282315 -N=10 -MAX=500000 -TRIALS=1 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 1 trial of insertion sort (8.833s total) 1 trial of standard sort & slice (30.524s total) P:\test>282315 -N=100 -MAX=500000 -TRIALS=1 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 +499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 499984 +499984 1 trial of insertion sort (8.002s total) 1 trial of standard sort & slice (26.198s total) P:\test>282315 -N=100 -MAX=1000000 -TRIALS=1 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 9 999969 999969 999969 999969 999969 999969 999969 999969 999969 999969 +999969 9 1 trial of insertion sort (15.733s total) 1 trial of standard sort & slice (90.730s total)
You could probably save even more if you don't have to have the whole 10_000_000 in memory for any other reason than the sort by reading them in in batches of 10_000 or so and pasing them to iSortN(). So long as you pass in the same arrayref as the first parameter, you can sort in batches or even one at a time (with the obvious penaties).
|
|---|