OK, now I have benchmarked the sort and the walk through only once solution (two variants):
use strict; use warnings; use Benchmark qw(timethese); my $nb_elements = 2e6; my @top3; my @startArray; push @startArray, ["", "", int rand (1e7)] for 1..$nb_elements; sub discard_sort{ my @masterArray = @startArray; my @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2]; my $min_top = $top3[2][2]; $nb_elements--; for my $sub_aref (@masterArray [3..$nb_elements]) { next if $sub_aref->[2] <= $min_top; @top3 = (sort {$b->[2] <=> $a->[2]} $sub_aref, @top3)[0..2]; $min_top = $top3[2][2]; } } sub discard_manual{ my @masterArray = @startArray; @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2]; my $min_top = $top3[2][2]; $nb_elements--; for my $sub_aref (@masterArray [3..$nb_elements]) { next if $sub_aref->[2] <= $min_top; if ($sub_aref->[2] > $top3[0][2]) { unshift @top3, $sub_aref; pop @top3; } elsif ( $sub_aref->[2] > $top3[1][2]) { $top3[2] = $top3[1]; $top3[1] = $sub_aref; } else { pop @top3; push @top3, $sub_aref; } $min_top = $top3[2][2]; } } sub sort_strategy { my @masterArray = @startArray; my @top3 = (sort {$b->[2] <=> $a->[2]} @masterArray)[0..2]; } timethese (5, { 'walk once with sort' => \&discard_sort, 'walk once manual insert' => \&discard_manual, 'sort' => \&sort_strategy});
The results show pretty clearly that the walk-through once solution which I originally proposed is much much faster than a sort on the full array:
Benchmark: timing 5 iterations of sort, walk once manual insert, walk +once with sort... sort: 60 wallclock secs (60.37 usr + 0.02 sys = 60.39 CPU) @ 0.08/s +(n=5) walk once manual insert: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2 +.79 CPU) @ 1.79/s (n=5) walk once with sort: 3 wallclock secs ( 2.76 usr + 0.00 sys = 2.76 +CPU) @ 1.81/s (n=5)
On a dataset of 2 million records, the walk through once solution is about 21 times faster than the solution sorting the full array. For the walk through once solution, I tried two variants: using sort to maintain the auxiliary array and doing a manual insertion sort on this auxiliary array. It turns out there is very little difference between these two solutions, but the simpler sort function solution is actually slightly better, so it was not worth the effort of trying a manual insertion sort.
The larger the dataset, the larger the advantage of the walk through once strategy: with 5 million records, the walk-through once solution is 24.6 faster than the full array sort:
Benchmark: timing 5 iterations of sort, walk once manual insert, walk +once with sort... sort: 170 wallclock secs (169.96 usr + 0.03 sys = 169.99 CPU) @ 0.03 +/s (n=5) walk once manual insert: 7 wallclock secs ( 6.90 usr + 0.03 sys = 6 +.93 CPU) @ 0.72/s (n=5) walk once with sort: 7 wallclock secs ( 6.86 usr + 0.00 sys = 6.86 +CPU) @ 0.73/s (n=5)
Note that, in order to produce realistic benchmark figures for the sort solution, I had to make a copy of the original array, so that the sort would not work on an already sorted array. Part of the recorded durations is therefore wasted making a copy of the array. If we were to subtract this part from the recorded duration, the advantage of the walk through once solution would be even larger. The copying of the 5-million record array takes about 2 seconds, so that the corrected figures would be 168 sec. for the sort, and 5 sec. for the walk through once solutions, meaning that the later is actually about 33 times faster than the sort.
In reply to Re: sorting array of arrays reference
by Laurent_R
in thread sorting array of arrays reference
by kimlid2810
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |