Sorting so as to grab the top N elements is only efficient if you need the list to be sorted anyway.

On the other hand, if you just need top N, and you need it again from time to time, you might use a heap. Heaps are great for keeping track of what comes at the top, without wasting a lot of time keeping the bottom precisely sorted. If all you want is the top N elements just once, a heap is overkill, but I couldn't resist the opportunity to brush up on my recollection of how to use Heap::Simple.

The following snippet generates a hash of a whole bunch of elements. The keys are 'AA' through 'FF', and the values are random integers from 0 through 999. The hash is then converted to a heap (this constitutes the bulk of the overhead) where the heap is heap-ordered by values from the hash. Key/Value pairs are stored in anonymous arrays held in the heap. When you pluck an item from the heap, it's set up to yield the item with the greatest value. The next item you pluck will be the item with second greatest value, and so on.

Here's the example snippet...

use strict; use warnings; use Heap::Simple::Perl; my %hash; $hash{$_} = int( rand( 1000 ) ) for 'AA' .. 'FF'; my $find_n = 10; my $heap = heapify( \%hash ); my $top = get_top_n( $heap, $find_n ); foreach my $element ( @{$top} ) { print "$element->[0]\t=>$element->[1]\n"; } sub heapify { my( $href ) = $_[0]; my $heap = Heap::Simple::Perl->new( 'order' => '>', 'elements' => ['Array'] ); $heap->insert( [ $href->{$_}, $_ ] ) foreach keys %{$href}; return $heap; } sub get_top_n { my( $heap, $n ) = @_; my $top_n ; for( 1 .. $n ) { my( $value, $key ) = @{ $heap->extract_top() }; push @{ $top_n }, [ $key, $value ]; } return $top_n; }

Again, a heap will be useful and efficient if you are frequently needing the top N items from the list. But it's not all that efficient if all you're doing is grabbig the top N one time. For that, just go through the list one element at a time and keep track of the top ten values (no need to sort; that's just a waste of time).

You can get more information on heaps and priority queues here: Wikipedia: Heap, as well as (of course) from the POD for Heap::Simple. My favorite reference however is Mastering Algorithms with Perl

Again, I don't mean to imply that a heap is definately the answer to your need. It may be (depending on how many times you'll be pulling the top-n elements), but it may just be too much work. I post it here as a point of discussion and thank you for giving me the excuse to play around with these useful datastructures again.


Dave


In reply to Re: Sorting Hash By Large Number Values by davido
in thread Sorting Hash By Large Number Values by NathanE

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.