If it were not for pfaut's unpack suggestion, I almsot certainly would be using it, but the loss of performance of the perl implimentation relative to the C switch version means pushing as much of the work into perl as I can has dividends.

The original C code is here.

My current best perl implementation of the while thing is the hash2() in the test code below. hash() being my original brute force conversion. The two version appear to function the same as each other, with the unpack version coming out about %50 faster.

However, the number of first pass collisions worries me a little. It could be just be a function of the limited range of inputs, but its higher than I expected. The other possibility is that both implementations are equally wrong. If you (or anyone) has a few moments to compare the perl and the C versions and point out any obvious differences I'd be grateful.

Also, if anyone has any idea's or pointers for methods of testing the goodness of this type of hashing function I'd really like to hear/see them. Unfortunately, the vast majority of the hits on google relate to hashing fuctions used for cryptography which is a completly different ballpark. I'm slowly whittling them down and wading through them, but haven't found anything applicable yet.

My test code

#! perl -slw use strict; use constant GOLDEN_RATIO => 0x9e3779b9; use constant A => 0; use constant B => 1; use constant C => 2; sub mix ($$$) { $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>13); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<< 8); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>13); $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>12); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<16); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>> 5); $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>> 3); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<10); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>15); } sub hash { my ($key, $init) = @_; my @k = unpack 'C*', $key; my $len = length $key; my ($a, $b, $c) = (GOLDEN_RATIO, GOLDEN_RATIO, $init); while($len >=12) { $a += ($k[0] + ($k[1]<<8) + ($k[2]<<16) + ($k[3]<24)); $b += ($k[4] + ($k[5]<<8) + ($k[6]<<16) + ($k[7]<24)); $a += ($k[8] + ($k[9]<<8) + ($k[10]<<16) + ($k[11]<24)); mix($a,$b,$c); splice(@k, 0, 11); $len -= 12; } $c += length $key; goto "LAST".$len; LAST11: $c+= $k[10] <<24; LAST10: $c+= $k[9] <<16; LAST9: $c+= $k[8] <<8; LAST8: $b+= $k[7] <<24; LAST7: $b+= $k[6] <<16; LAST6: $b+= $k[5] <<8; LAST5: $b+= $k[4]; LAST4: $a+= $k[3] <<24; LAST3: $a+= $k[2] <<16; LAST2: $a+= $k[1] <<8; LAST1: $a+= $k[0]; LAST0: mix($a,$b,$c); return $c; } sub hash2 { my ($key,$init) = @_; my($a,$b,$c) = (GOLDEN_RATIO, GOLDEN_RATIO, $init); my ($p, $len) = (0, length $key); do { my($x,$y,$z) = unpack 'V3', substr($key, $p, 12); mix($a+=($x||0), $b+=($y||0), $c+=($z||$len)); $p+=12; } while $p <= $len; return $c; } my %freq; my $n=0; $n++, push @{$freq{hash( $_, 0)}}, $_ for 'AAAA' .. 'ZZZZ'; print $n, ':', scalar keys %freq; undef(%freq); $n=0; $n++, push @{$freq{hash2( $_, 0)}}, $_ for 'AAAA' .. 'ZZZZ'; print $n, ':', scalar keys %freq;

Benchmark results

c:\test>hash 456976 values resulted in 233710 unique hashes & a maximum of 34 first + pass collisions 456976 values resulted in 233710 unique hashes & a maximum of 34 first + pass collisions 1 trial of hash (195.751s total) 1 trial of hash2 (133.032s total)

Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.


In reply to Re: Re: Eek! goto? by BrowserUk
in thread Eek! goto? by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.