Below is a pure perl version of the lookup2 hash by Bob Jenkins as talked about at A Hash Function for Hash Table Lookup.

Edit: This is an updated version, in particular the mix4 mentioned at Re: Pure perl Jenkins 32 bit Hash is now called mix4x and may be discarded. Also there is code to select the proper version based on $Config{ivsize} and a test call.

{ # has use integer/bytes use integer; use bytes; # http://www.perlmonks.org/?node_id=315881 # http://burtleburtle.net/bob/c/lookup2.c # http://burtleburtle.net/bob/hash/doobs.html # http://search.cpan.org/~shlomif/Digest-JHash/lib/Digest/JHash.pm # http://cpansearch.perl.org/src/SHLOMIF/Digest-JHash-0.10/JHash.xs */ + use constant GOLDEN_RATIO => 0x9e3779b9; use constant A => 0; use constant B => 1; use constant C => 2; use constant FFFFFFFF => 0xffffffff; use constant KEY => 0; use constant INITHASH => 1; sub mix4 ($$$) { # 32bit version # per http://www.perlmonks.org/?node_id=1203705 this is a revised 32bi +t under 'use integer'; $_[A] -= $_[B]; $_[A] -= $_[C]; { no integer; $_[A] ^= ($_[C]>>13) +; } $_[B] -= $_[C]; $_[B] -= $_[A]; { no integer; $_[B] ^= ($_[A]<< 8) +; } $_[C] -= $_[A]; $_[C] -= $_[B]; { no integer; $_[C] ^= ($_[B]>>13) +; } $_[A] -= $_[B]; $_[A] -= $_[C]; { no integer; $_[A] ^= ($_[C]>>12) +; } $_[B] -= $_[C]; $_[B] -= $_[A]; { no integer; $_[B] ^= ($_[A]<<16) +; } $_[C] -= $_[A]; $_[C] -= $_[B]; { no integer; $_[C] ^= ($_[B]>> 5) +; } $_[A] -= $_[B]; $_[A] -= $_[C]; { no integer; $_[A] ^= ($_[C]>> 3) +; } $_[B] -= $_[C]; $_[B] -= $_[A]; { no integer; $_[B] ^= ($_[A]<<10) +; } $_[C] -= $_[A]; $_[C] -= $_[B]; { no integer; $_[C] ^= ($_[B]>>15) +; } } sub mix4x ($$$) { # per http://www.perlmonks.org/?node_id=1203705 this is wrong $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>13); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<< 8); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>13); $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>>12); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<16); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>> 5); $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] ^= ($_[C]>> 3); $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] ^= ($_[A]<<10); $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] ^= ($_[B]>>15); } sub mix8 ($$$) { # 64bit version $_[A] &= FFFFFFFF; $_[B] &= FFFFFFFF; $_[C] &= FFFFFFFF; $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] = ( $_[A] ^ ($_[C]>>13) ) & + FFFFFFFF; $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] = ( $_[B] ^ ($_[A]<< 8) ) & + FFFFFFFF; $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] = ( $_[C] ^ ($_[B]>>13) ) & + FFFFFFFF; $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] = ( $_[A] ^ ($_[C]>>12) ) & + FFFFFFFF; $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] = ( $_[B] ^ ($_[A]<<16) ) & + FFFFFFFF; $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] = ( $_[C] ^ ($_[B]>> 5) ) & + FFFFFFFF; $_[A] -= $_[B]; $_[A] -= $_[C]; $_[A] = ( $_[A] ^ ($_[C]>> 3) ) & + FFFFFFFF; $_[B] -= $_[C]; $_[B] -= $_[A]; $_[B] = ( $_[B] ^ ($_[A]<<10) ) & + FFFFFFFF; $_[C] -= $_[A]; $_[C] -= $_[B]; $_[C] = ( $_[C] ^ ($_[B]>>15) ) & + FFFFFFFF; } sub jhash_pp_hex { my ($a, $b, $c) = ( GOLDEN_RATIO, GOLDEN_RATIO, $_[INITHASH] ); my ($p, $length) = (0, length $_[KEY]); my $len=$length; my($x,$y,$z); while ($len>=12) { ($x,$y,$z) = unpack 'LLL', substr($_[KEY], $p, 12); $a+=$x;$b+=$y;$c+=$z; mix($a, $b, $c); $p += 12; $len-=12; } # even if len==0 we need another round to mix in the length ($x,$y,$z) = unpack 'LLL', substr($_[KEY] . (chr(0)x12), $p, 1 +2); $z<<=8; # /* the first byte of c is reserved for the length * +/ $z+=$length; $a+=$x;$b+=$y;$c+=$z; mix($a, $b, $c); my $hex = unpack("H*", pack("N", $c)); return $hex; } # jhash_pp_hex use Config; if ( $Config{ivsize} == 4 ) { *main::mix=*main::mix4; } else { *main::mix=*main::mix8; } } # has use integer/bytes print jhash_pp_hex('abcdef',0)."\n";

I had a situation where i could not use Digest::JHash because i did not have access to a compiler.

I needed to hash filenames for a filetracking program using a mysql database. Rather than carry the filename in 4+ tables i use a unique fnid assigned in a fnid table that holds the filename in a text bucket.

As you cannot really index a text field i was originally using md5 on the filename as a key. I knew there can be collisions but that to select a single filename SELECT fnid FROM fnid WHERE fnmd5=? and fn=? would do ok if fnmd5 was a index.

But md5 in hex is 32 chars, and that was real big, bigger than i needed. So i searched for 32 bit hashs, found the jenkins variants and found Re: Fast string hash in portable perl? [DO NOT USE!] by BrowserUk so i decided to try that. My disappointment will be described in a separate reply thread about the history below [history] Pure perl Jenkins 32 bit Hash.

This also calls into question what to do about Digest::JHash, again i will use a separate reply thread [Digest::JHash problem] Pure perl Jenkins 32 bit Hash to talk about that.

Overall i am happy with this. I realize there are newer jenkins and other 32bit hashs, but this will do. It reduced the phpmyadmin reported size of the fnid table by 50%. It is fast enough for my needs, it seems to be the hash perl uses for its hashs.


In reply to Pure perl Jenkins 32 bit Hash by huck

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.