in reply to Sorting Hash By Large Number Values
Sorting so as to grab the top N elements is only efficient if you need the list to be sorted anyway.
On the other hand, if you just need top N, and you need it again from time to time, you might use a heap. Heaps are great for keeping track of what comes at the top, without wasting a lot of time keeping the bottom precisely sorted. If all you want is the top N elements just once, a heap is overkill, but I couldn't resist the opportunity to brush up on my recollection of how to use Heap::Simple.
The following snippet generates a hash of a whole bunch of elements. The keys are 'AA' through 'FF', and the values are random integers from 0 through 999. The hash is then converted to a heap (this constitutes the bulk of the overhead) where the heap is heap-ordered by values from the hash. Key/Value pairs are stored in anonymous arrays held in the heap. When you pluck an item from the heap, it's set up to yield the item with the greatest value. The next item you pluck will be the item with second greatest value, and so on.
Here's the example snippet...
use strict; use warnings; use Heap::Simple::Perl; my %hash; $hash{$_} = int( rand( 1000 ) ) for 'AA' .. 'FF'; my $find_n = 10; my $heap = heapify( \%hash ); my $top = get_top_n( $heap, $find_n ); foreach my $element ( @{$top} ) { print "$element->[0]\t=>$element->[1]\n"; } sub heapify { my( $href ) = $_[0]; my $heap = Heap::Simple::Perl->new( 'order' => '>', 'elements' => ['Array'] ); $heap->insert( [ $href->{$_}, $_ ] ) foreach keys %{$href}; return $heap; } sub get_top_n { my( $heap, $n ) = @_; my $top_n ; for( 1 .. $n ) { my( $value, $key ) = @{ $heap->extract_top() }; push @{ $top_n }, [ $key, $value ]; } return $top_n; }
Again, a heap will be useful and efficient if you are frequently needing the top N items from the list. But it's not all that efficient if all you're doing is grabbig the top N one time. For that, just go through the list one element at a time and keep track of the top ten values (no need to sort; that's just a waste of time).
You can get more information on heaps and priority queues here: Wikipedia: Heap, as well as (of course) from the POD for Heap::Simple. My favorite reference however is Mastering Algorithms with Perl
Again, I don't mean to imply that a heap is definately the answer to your need. It may be (depending on how many times you'll be pulling the top-n elements), but it may just be too much work. I post it here as a point of discussion and thank you for giving me the excuse to play around with these useful datastructures again.
Dave
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Sorting Hash By Large Number Values
by thospel (Hermit) on Oct 06, 2005 at 21:04 UTC | |
Re^2: Sorting Hash By Large Number Values
by anonymized user 468275 (Curate) on Aug 03, 2005 at 13:08 UTC | |
by davido (Cardinal) on Aug 03, 2005 at 15:31 UTC | |
by anonymized user 468275 (Curate) on Aug 03, 2005 at 15:40 UTC | |
Re^2: Sorting Hash By Large Number Values
by Limbic~Region (Chancellor) on Aug 03, 2005 at 13:27 UTC | |
Re^2: Sorting Hash By Large Number Values
by anonymized user 468275 (Curate) on Aug 04, 2005 at 08:16 UTC |