in reply to sorting array of arrays reference

If you need only the three top elements, then sorting the array of arrays is overkill and will be inefficient if the master array is large. Not good if speed is really important.

I would go for a selection algorithm could be done in two different ways: (1) the simpler (but slower) is to walk through the array once and memorize the array ref of the top element; then walk again through and find the next to top element, and a third time; (2) slightly more complicated is to walk through the array only once and to maintain as you go an auxiliary array of the three top elements; maintaining this auxiliary array is somewhat complicated, but nothing insurmountable.

Replies are listed 'Best First'.
Re^2: sorting array of arrays reference
by kimlid2810 (Acolyte) on Oct 25, 2013 at 20:23 UTC
    how is this auxiliary array going to be maintained?

      OK, a bit more time now, this is one possible way of doing it:

      use strict; use warnings; use Data::Dumper; my @masterArray = ( ["this", "that", 12563, "something", "else"], ["this", "that", 10, "something", "else"], ["this", "that", 1, "something", "else"], ["this", "that", 125638, "something", "else"], ["this", "that", 300000, "something", "else"], ); my @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2]; my $min_top = $top3[2][2]; for my $sub_aref (@masterArray [3..$#masterArray]) { next if $sub_aref <= $min_top; @top3 = (sort {$b->[2] <=> $a->[2]} @top3, $sub_aref)[0..2]; $min_top = $top3[2][2]; } print Dumper @top3;

      This yields the following result:

      $ perl subdiscard.pl $VAR1 = [ 'this', 'that', 300000, 'something', 'else' ]; $VAR2 = [ 'this', 'that', 125638, 'something', 'else' ]; $VAR3 = [ 'this', 'that', 12563, 'something', 'else' ];

      A more general solution might be like this:

      use strict; use warnings; use Data::Dumper; my $nb_elements = shift; chomp $nb_elements ; my @masterArray; push @masterArray, ["", "", int rand (1e7), ""] for 1..$nb_elements; # print Dumper \@masterArray; my @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2]; my $min_top = $top3[2][2]; $nb_elements--; for my $sub_aref (@masterArray [3..$nb_elements]) { next if $sub_aref->[2] <= $min_top; @top3 = (sort {$b->[2] <=> $a->[2]} @top3, $sub_aref)[0..2]; $min_top = $top3[2][2]; } print Dumper \@top3;

      With one million records, the execution time is about 2.5 seconds:

      $ time perl subdiscard2.pl 1000000 $VAR1 = [ [ '', '', 9999996, '' ], [ '', '', 9999993, '' ], [ '', '', 9999990, '' ] ]; real 0m2.497s user 0m2.386s sys 0m0.108s

      Sorting the original array and taking the first 3 elements takes about 3 times longer:

      $ time perl subdiscard3.pl 1000000 $VAR1 = [ [ '', '', 9999980, '' ], [ '', '', 9999955, '' ], [ '', '', 9999944, '' ] ]; real 0m7.605s user 0m7.518s sys 0m0.093s

      But, in fact, in the 2.5 seconds taken by the program above, most of it (more than 2.2 seconds) is used for populating the array with random values, so that the difference between the algorithm presented above and a pure sort is much larger than it appears, probably at least a factor of 10. I'll do a real benchmark later if I can find the time.

      Nothing really complicated, But I just can't do it right now. No time.
Re^2: sorting array of arrays reference
by BillKSmith (Monsignor) on Oct 26, 2013 at 18:55 UTC
    Think of this as the first three passes of a bubble sort. I like it.
    Bill