Re: Given ref to hash value, retrieve the key

Hello all around and thank you for your comments. Several contain excellent suggestions, and some make the larger (and correct) point that I'm barking up the wrong tree. Time to re-think the data structure.

The root issue is that I'm trying to use a hash to implement a C-style struct with pointers. Bad karma to think this way in the Monastery, no?

Background: I'm trying to implement an n-level Markov chain that's efficient enough to handle large data sets. My hope was to use the words in the chain as hash keys, then build the rest of the data structure (except the weights of following words, which are ints) using arrays and references. There is another node dealing with the general issue of Markov chains, so I'll hold off on posting more here. Anyone who's interested in the topic, though, is welcome to join me over there. I'll post what I come up with (after doing some more thinking) then look forward to feedback.

Once again, thank you to everyone who posted. These were very helpful comments.

Comment on Re: Given ref to hash value, retrieve the key

Replies are listed 'Best First'.
Re: Given ref to hash value, retrieve the key by jonadab (Parson) on May 24, 2003 at 04:00 UTC
Your initial request immediately made me think of the lisp function called rassq, which operates on an alist -- which is very similar to a hash but implemented differently; because of the way Perl's hashes are implemented, forward lookups are very fast, but reverse lookups are relatively expensive (as you've discovered). With alists, forward and reverse lookups are roughly equally expensive. It is relatively easy to implement an alist in Perl. (The following code is untested... `my @alist = ( [key1 => value1], [key2 => value2], [key3 => value3], ); sub assq { my ($alistref, $key) = @_; for (@$alistref) { return $_ if $$_[0] eq $key; # eq doesn't mean exactly the same thing as in lisp, # but don't worry about that if you don't know lisp. } } sub rassq { my ($alistref, $value) = @_; for (@$alistref) { return $_ if $$_[1] eq $value; # eq doesn't mean exactly the same thing as in lisp, # but don't worry about that if you don't know lisp. } }` [download] If you are working with large alists, assq will be much slower than the equivalent hash lookup. You probably can improve it slightly be replacing foreach with an ugly C-style for loop. I also don't know whether rassq will be significantly faster than doing the equivalent thing on a hash; what I do know is that it will be just as fast as assq. If you want to compare the alist rassq with the equivalent reverse hash lookup, I suggest using Benchmark. Here's the closest thing I can come up with for doing a rassq-like operation on a hash: `sub hashrassq { my ($hashref, $value) = @_; foreach (keys %$hashref) { return $_ if $$hashref{$_} eq $value; } }` [download] Also note that there are significant differences if more than one hash key has the same value; the alist implementation will always return the first one as defined in terms of the order in which the associations were added to the list (unless you add by unshifting or similar), but the hashrassq will pick one (for practical purposes) arbitrarily. You can define an order by sorting the key list for the foreach loop, but this does not allow you to take the first one added to the hash, since hashes don't store order like that. Also note that the hash implementation may be more efficient if you iterate over the hash using each, just as the alist may be more efficient if you iterate by index. Only way to be sure is to use Benchmark.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Given ref to hash value, retrieve the key
by jonadab (Parson) on May 24, 2003 at 04:00 UTC

Your initial request immediately made me think of the lisp function called rassq, which operates on an alist -- which is very similar to a hash but implemented differently; because of the way Perl's hashes are implemented, forward lookups are very fast, but reverse lookups are relatively expensive (as you've discovered). With alists, forward and reverse lookups are roughly equally expensive.

It is relatively easy to implement an alist in Perl. (The following code is untested...

my @alist = (
  [key1 => value1],
  [key2 => value2],
  [key3 => value3],
);

sub assq {
  my ($alistref, $key) = @_;
  for (@$alistref) {
     return $_ if $$_[0] eq $key;
     # eq doesn't mean exactly the same thing as in lisp,
     # but don't worry about that if you don't know lisp.
  }
}
sub rassq {
  my ($alistref, $value) = @_;
  for (@$alistref) {
     return $_ if $$_[1] eq $value;
     # eq doesn't mean exactly the same thing as in lisp,
     # but don't worry about that if you don't know lisp.
  }
}
[download]

If you are working with large alists, assq will be much slower than the equivalent hash lookup. You probably can improve it slightly be replacing foreach with an ugly C-style for loop.

I also don't know whether rassq will be significantly faster than doing the equivalent thing on a hash; what I do know is that it will be just as fast as assq. If you want to compare the alist rassq with the equivalent reverse hash lookup, I suggest using Benchmark. Here's the closest thing I can come up with for doing a rassq-like operation on a hash:

sub hashrassq {
  my ($hashref, $value) = @_;
  foreach (keys %$hashref) {
     return $_ if $$hashref{$_} eq $value;
  }
}
[download]

Also note that there are significant differences if more than one hash key has the same value; the alist implementation will always return the first one as defined in terms of the order in which the associations were added to the list (unless you add by unshifting or similar), but the hashrassq will pick one (for practical purposes) arbitrarily. You can define an order by sorting the key list for the foreach loop, but this does not allow you to take the first one added to the hash, since hashes don't store order like that. Also note that the hash implementation may be more efficient if you iterate over the hash using each, just as the alist may be more efficient if you iterate by index. Only way to be sure is to use Benchmark.

[reply]
[d/l]
[select]