vinforget has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, Quick question: What would be the most efficient way (apart from a simple search) of testing if a sortable key exist in a hash AND (here's the real question ) if it doesn't to get the next highest one. So,
%hash = ('1' => cat', '3'=> 'mouse', '5'=>'whale' ); if $hash{2} defined then return $hash{2} else return next element sorted by hash key, in this case, $hash{3}
without going through all the trouble of making a for loop to check each element after '2' until find one defined. Thanks for the help Vince

Replies are listed 'Best First'.
Re: retrieve next avaiable element in sorted hash
by thelenm (Vicar) on Oct 06, 2003 at 21:40 UTC

    Nothing is jumping out at me from CPAN, except maybe Tie::RangeHash, which you might find a way to use with what you're doing. If not, here's a solution (not particularly efficient though):

    sub get_fuzzy { my ($k, $h) = @_; return $h->{$k} if exists $h->{$k}; for (sort keys %$h) { return $h->{$_} if $_ gt $k; # if keys are numeric, change "g +t" to ">" } return undef; # if key is higher than the highest } my %hash = (a=>1, c=>3, e=>5, g=>7, i=>9); print "|", join('|', map { get_fuzzy($_, \%h) } a..j), "|\n";

    It's inefficient because it sorts all the hash keys each time there is a lookup and does a linear search through them. Also, I'm not sure what you want to do when looking up a value larger than the largest hash key. In this case, my code returns undef.

    If you want efficiency, you may want to tie a class of your own that maintains the keys in sorted order, perhaps even indexed so you could do a binary search instead of a linear one.

    -- Mike

    --
    XML::Simpler does not require XML::Parser or a SAX parser. It does require File::Slurp.
    -- grantm, perldoc XML::Simpler

      thelenm,
      I was thinking about adding a new method to Tie::Hash::Sorted after reading this thread as it already stores the hash keys in a sorted array. A person may want to indicate where in the sorted hash they wanted to start from on a case by case basis. After giving it some thought, it won't help here.

      The problem is that what is being asked for is not they next key, but the next key with a defined value. Even with the binary search, you still need to iterate after you find your target until you find a defined key value. My idea is a variation that should get the job done.

      Cheers - L~R

        Even with the binary search, you still need to iterate after you find your target until you find a defined key.

        Not so. Its trivial to adjust binary search to handle "find the node immediately preceding/following my value or the value itself" searches. If you already know one endpoint linear search makes sense, as does using it as the left boundary and range searching the appropriate side.


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi


Re: retrieve next avaiable element in sorted hash
by cLive ;-) (Prior) on Oct 06, 2003 at 22:19 UTC
    I don't think you can avoid a loop of some sort - either way you have to work out the index of the element in the sorted keys.

    Here's my quick stab:

    #!/usr/bin/perl use strict; use warnings; my %hash = ('1' => 'cat', '3' => 'mouse', '5' => 'whale', '12' => 'dog', ); my $top_hash_key = (sort {$a<=>$b} keys %hash)[-1]; my $next_val = retrieve(2); print "retrieve 2: $next_val\n"; my $other_val = retrieve(8); print "retrieve 8: $other_val\n"; my $too_high = retrieve(15); print "retrieve 15: $too_high\n"; sub retrieve { my $i = $_[0]; 1 while (!defined $hash{$i} and $i++<$top_hash_key); return defined $hash{$i} ? $hash{$i} : ''; }

    .02

    cLive;-)

Re: retrieve next avaiable element in sorted hash
by jonadab (Parson) on Oct 07, 2003 at 00:38 UTC

    Instead of a hash, you may want to use a data structure that maintains an order. An alist (a concept borrowed from lisp) seems suitable...

    my @alist = ( [1 => 'cat'], [3 => 'mouse'], [5 => 'whale'], ); alistinsert(\@alist, 2, 'porcupine'); my $pair = alistatleast(\@alist, 4); # $pair now contains [5, 'whale'] print "The first element of the alist with a key of at least 4 is ($ +$pair[0], $$pair[1])\n"; sub alistatleast { my ($alist, $minval) = @_; for (@$alist) { return $_ if $$_[0] >= $minval; } } sub alistinsert { my ($alist, $car, $cdr) = @_; my $posn = 0; for (@$alist) { ++$posn if $$_[0] < $car } splice @$alist, $posn, 0, [$car, $cdr]; }

    This saves you from doing a sort for every lookup, which will help performance if you have a lot of elements, taking a worst-case lookup down to O(n) time (from at least O(n log n) time if not more with the sort). It is possible to do better, by making the search binary instead of linear, which should get you down to O(log n) time I think. If your lookups have to be faster than that, you have to take drasting measures that will cost in other areas. e.g., you could have all possible keys be defined, or all keys that are multiples of some value, but if they don't have their own value give them a reference to the next one up with a real value. That gets you down to O(1) time for lookups, but it costs elsewise, by raising storage requirements and greatly complicating your code, among other things.


    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Re: retrieve next avaiable element in sorted hash
by InfiniteSilence (Curate) on Oct 06, 2003 at 22:17 UTC
    At first glance I would say:
    #!/usr/bin/perl -w use strict; my %hash = ('1' => 'cat', '3'=> 'mouse', '5'=>'whale' ); my $elem = 2; if (defined($hash{2})) {print $hash{2}} else { print $hash{$elem+1}}; 1;
    But at second glance I realize that $elem+1 may not exist either in which case you should just failout like so:
    #!/usr/bin/perl -w use strict; my %hash = ('1' => 'cat', '3'=> 'mouse', '5'=>'whale' ); my $elem = 2; if (defined($hash{$elem})) { print $hash{$elem} } elsif (defined $hash{$elem + 1} ) { print $hash{$elem+1} } else { print 'ERROR!!!!!!'; } 1;

    Celebrate Intellectual Diversity

Re: retrieve next avaiable element in sorted hash
by Limbic~Region (Chancellor) on Oct 06, 2003 at 23:35 UTC
    vinforget,
    It seems to me that what makes this problem difficult to solve without a loop is that you want the next key after a specified key that is defined. I do not know how much more efficient the following logic would be then a loop. I imagine it depends on the size of the hash and the location of the key.

  • If the desired key does not exist, return undef
  • If the desired key is defined, return the value
  • If not, create an array of sorted hash keys that are defined + target key
    my @keys = grep { defined $hash{$_} || $_ eq $key } sort keys %hash;
  • Use a binary search to find the index of the target key and return the array element one higher
  • It is your choice if it should loop around to the first key if it is the last element or return undef

    I am too tired at the moment (just started a new job) to actually write this code, but the binary search should be noticeably faster on large hashes especially if the target key is not at the beginning. If you are interested I will update it later.

    Cheers - L~R

Re: retrieve next avaiable element in sorted hash
by tilly (Archbishop) on Oct 07, 2003 at 06:37 UTC
    While a hash is a bad data structure for this, a BTREE looks a lot like a hash programmatically, but this question is easy for a BTREE. Unfortunately in the most convenient BTREE implementation to use from Perl, the API is a little convoluted:
    #! /usr/bin/perl use strict; use DB_File; tie my %hash, 'DB_File', undef, O_CREAT|O_RDWR, 0666, $DB_BTREE; %hash = ( '1' => 'cat', '3' => 'mouse', '5' => 'whale' ); tied(%hash)->seq(my $key = 2, my $ans, R_CURSOR); print "Found key $key, answer $ans\n";
    The main reason for this API being so complicated is that you generally want somehow to indicate both the key and value discovered. This doesn't fit with the appearance a basic hash lookup. Therefore the attempt that tie makes to turn something else into an illusion of a hash has to break down..badly.
Re: retrieve next avaiable element in sorted hash
by demerphq (Chancellor) on Oct 06, 2003 at 23:50 UTC

    Quick question: What would be the most efficient way (apart from a simple search) of testing if a sortable key exist in a hash AND (here's the real question ) if it doesn't to get the next highest one.

    This isn't a quick question at all. Essentially its not an operation that hashes are suitable for. Depending on how often you fetch and how often you store will determine the best way to do it. If the data is relatively static and you are looking at optimising look up times then I would consider a pair of arrays, one with the keys, they other with the values. Binary range searching isn't difficult and makes look up times fast. OTOH, if you have a good ratio of stores to fetches then a tree structure will be more suitable. Some hybrid may also be preferable.

    PS: Did you really mean defined there? Did you actually mean exists?


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      Exists would be more appropriate
Re: retrieve next avaiable element in sorted hash
by jmanning2k (Pilgrim) on Oct 07, 2003 at 17:35 UTC
    Why don't you keep a pre-sorted index array of all the hash keys along with your hash. This would only have to be regenerated when you add or remove hash elements.

    Then, you can use a binary search to look up your hash key in the index array and return the next largest key.

    This eliminates the sort involved with each lookup, which I would guess is the biggest slowdown to your current code.

    It also changes your search from simple to binary, which should cut the search time by an order of magnitude.

      Thanks. This seems like a good general approach. Just what I was looking for. Vince