Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: More Sorted Business with Hashes

by mstone (Deacon)
on Dec 24, 2001 at 05:03 UTC ( [id://134129] : note . print w/replies, xml ) Need Help??

in reply to More Sorted Business with Hashes

The short answer is that you can't. The long answer deals with why you can't:

A 'hash function' reduces any input to a number within a given range. In most cases, this involves some kind of bit-twiddling:

$HASH_LIMIT = 1024; ## myhash (key:string) : int # # calculate a checksum # sub myhash { my $key = shift; my $h = 0; for $c (unpack ('c*', $key)) { ## break the key into chars $h *= 2; ## shift the hash value left one bit $h ^= $c; ## and XOR it with the next char } $h %= $HASH_LIMIT; ## trim to fit the defined range return ($h); ## and return the result } for $k (qw( testing the hash function )) { printf ("%10s - %s\n", $k, &myhash($k)); }

A 'hash' is a data structure that stores values in an array, using the value returned by a hash function as an array index:

@MY_HASH = (0) x $HASH_LIMIT; ## put (key:string, val:string) : nil # # store a value in a hash # sub put { my $key = shift; my $val = shift; $MY_HASH[ &myhash($key) ] = $val; return; } ## get (key:string) : string # # get a value from the hash # sub get { my $key = shift; return ($MY_HASH[ &myhash($key) ]); }

and this code is admittedly trivial because it doesn't check for collisions.. cases where two keys return the same hash value. Collisions are inevitable when you deal with hash functions, because hash functions map an infinite domain of input strings to a finite range of output values. It's just hard (very hard) to calculate which strings will map to a given hash value. Most of the interesting work involving hashes revolves around what to do with collisions, but there are too many alternatives to discuss here.

Hashes are great when you want to work with non-numeric information. Calculating the index value directly from the key is (usually) faster than searching for a specific string in a sorted array. It's also (usually) easier to store a new value in a hash than to shoehorn one into a sorted array, though there are data structures (heaps) which do just that.

The down side is that you can't assume your hash function will preserve attributes like alphabetical order. It's just a tradeoff you just have to accept if you're going to store information in a hash.