Gavin has asked for the wisdom of the Perl Monks concerning the following question:

Hi Brethren,
Given a hash where the keys value pairs are sorted in decending order and the total number of key/pairs are known is it possible to select say the top 10% or 20% and print them to file?
and how might this be done?
Any help much appreciated.

Replies are listed 'Best First'.
Re: Selection of Hash key value pairs
by salva (Canon) on Mar 31, 2006 at 11:54 UTC
    If you have a huge number of elements in the hash, sorting then all can be an expensive and unnecessary operation when only the top 10% or 20% is going to be used later.

    In that case, using a heap data structure as implemented by Heap::Simple could be a better solution:

    use Heap::Simple; my %data = (...); my $heap = Heap::Simple->new(order => 'gt'); $heap->insert(keys %data); for (1..int(0.1 * keys %data)) { my $key = $heap->extract_top; print "key: $key, value: $data{$key}\n" }
Re: Selection of Hash key value pairs
by lima1 (Curate) on Mar 31, 2006 at 10:09 UTC
    my $i = 0; my $cutoff = ( keys %h ) / 10; KEY: foreach my $key ( sort { $b <=> $a } keys %h ) { $i++; last KEY if $i > $cutoff; # print to file }
Re: Selection of Hash key value pairs
by arkturuz (Curate) on Mar 31, 2006 at 10:10 UTC
    The easiest way would be like this:

    use strict; my %hash = ( a_one => 1, b_two => 2, c_three => 3, d_four => 4, e_five => 5, f_six => 6, g_seven => 7, h_eight => 8, i_nine => 9, j_ten => 10 ); my $pairs = (keys %hash); my $counter = 0; my $amount_to_print = int(0.20 * $pairs); foreach my $key (reverse sort keys %hash) { if ($counter < $amount_to_print) { print $key, ' ', $hash{$key}, "\n"; } else { last; } $counter++; }

    This will print:

    j_ten 10 i_nine 9

    Maybe there's some better solution?

      A little shorter:

      my @keys_to_print = (reverse sort keys %hash)[0 .. int(0.20 * (keys %h +ash))-1]; foreach my $key (@keys_to_print) { print_to_file($key, $hash{$key}); } sub print_to_file { # print to file }
Re: Selection of Hash key value pairs
by johngg (Canon) on Mar 31, 2006 at 14:12 UTC
    You say where the keys value pairs are sorted but you do not specify whether you are sorting by key or by value. Assuming you mean by value, this would work:-

    use strict; use warnings; # Initialise a hash with random values. # our %hash; $hash{$_} = int rand 500 for 0 .. (int rand 100) + 100; # Count our keys, calculate top 20% number, initialise # count of how many printed. # our $keyCt = scalar keys %hash; our $perc = 20; our $limit = int($keyCt * $perc / 100); our $iters = 0; # Feed in keys, map keys and values, sort by values, # grep out the top however many and print key and # value. # print map {"$_->[0] - $hash{$_->[0]}\n"} grep {$iters ++ < $limit} sort {$a->[1] <=> $b->[1]} map {[$_, $hash{$_}]} keys %hash;

    I hope this is of use.

    Cheers,

    JohnGG

Re: Selection of Hash key value pairs
by radiantmatrix (Parson) on Mar 31, 2006 at 15:23 UTC

    Is it the keys or the values that are sorted? I will assume keys, but you can apply anything you like later:

    update – ww pointed out that @k/$pct only works when $pct == 10. I feel sheepish, but I've corrected the code below. :-/

    my @k = sort keys %the_hash; my $pct = 10; # what percent to print #- WRONG -# for ( 0..int(@k/$pct)-1 ) { for ( 0.. int( @k * ($pct/100) ) )-1 ) { printf "%s => %s\n", $k[$_], $the_hash{$k[$_]} }

    Or, if you want to keep the top x% and discard the rest:

    my @k = sort keys %the_hash; my $pct = 10; # what percent to keep #- WRONG -# for ( int(@k/$pct)-1..$#k ) { delete $the_hash{$k[$_]} } for ( int( @k * ($pct/100).. ) )-1..$#k ) { delete $the_hash{$k[$_]} } # Now %the_hash only contains the top 10%
    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet