A simpler fix is just to swap to bit-shifts instead of multiplying by powers of two (Shift Operators):
$hash = ($c + ($hash << 6) + ($hash << 16) - $hash) & 0xffffffff;
$hash = ($c + (($hash & 0x03ffffff) << 6) + (($hash & 0x0000ffff) << 16) - $hash) & 0xffffffff;
This has the advantage of making behavior independent of machine word length (assuming at least 32-bit, of course).
Update: Modified as per ikegami's comment below. For some reason I thought Perl handled bit-shift overflow in a standardized fashion by dropping bits.
Except the above is subject to arithmetic overflow once the addition stage is reached, with maximum error at $c=0xFFFFFFFF;$hash=0x0FFFFFFF;. Short of implementing a Full Adder, I think bit operations are not really reasonable. I give you the portable, overflow-immune hash function:
sub ModifiedMakeHashString($) {
my $string = shift;
my $hash = 0;
for my $c (map {ord} split //,$string) {
my $six_shift = ($hash & 0x03ffffff) << 6;
my $sixteen_shift = ($hash & 0x0000ffff) << 16;
my $inverse = ~($hash-1) & 0xffffffff;
$hash = floored_addition(
floored_addition($c, $six_shift),
floored_addition($sixteen_shift, $inverse));
#printf "c = %09x, 6s = %09x, 16s = %09x, inv = %09x, hash = %
+09x\n",
# $c, $six_shift, $sixteen_shift, $inverse, $hash;
#printf "c = %d, hash = %d\n", $c, $hash;
}
return $hash;
}
sub floored_addition {
my ($x,$y) = @_;
for my $i (reverse (0 .. 31)) {
my $check_mask = 1 << $i;
if ($check_mask & $x & $y) {
my $op_mask = ~(2**$i-1) & 0xffffffff;
$x ^= $op_mask&$x;
$y ^= $op_mask&$y;
last;
}
last if $check_mask & ~$x & ~$y;
}
return $x + $y;
}
It runs about 10x slower than the original. Any obvious algorithmic improvements, other than the obvious? |