in reply to Fast string hash in portable perl?

Update: 14 years after posting huck shows that this code is broken!

You'd best do your own testing of this before you consider using it, to convince yourself it is a faithful implementation of the original.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

Replies are listed 'Best First'.
Re^2: Fast string hash in portable perl?
by huck (Prior) on Nov 16, 2017 at 13:05 UTC

      Post above modified to reflect your findings! Many thanks.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Re: Fast string hash in portable perl?
by tachyon (Chancellor) on Dec 20, 2003 at 03:36 UTC

    We have this hashing function (Jenkins Hash) as an XS extension, yes you have to compile it (we have compiled on Win32 and Linux without issues) but the result is of course much faster. You can get a copy from http://tachyon.perlmonk.org/Digest-JHash-0.02.tar.gz

    We did a fair bit of dispersion testing on it and found it to be pretty damn good using URLs as the input data which is what we wanted it for. The dispersion peaks at somewhere around 90%+ of slots from memory.

    Maybe we should post Digest::JHash and Digest::JHahs::PurePerl to CPAN?. Here it is as XS (Digest::JHash.XS):

    #include "EXTERN.h" #include "perl.h" #include "XSUB.h" /* Jenkins Hash http://burtleburtle.net/bob/hash/doobs.html */ const int DEBUG = 0; #define MIX(a,b,c) \ { \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<<8); \ c -= a; c -= b; c ^= (b>>13); \ a -= b; a -= c; a ^= (c>>12); \ b -= c; b -= a; b ^= (a<<16); \ c -= a; c -= b; c ^= (b>>5); \ a -= b; a -= c; a ^= (c>>3); \ b -= c; b -= a; b ^= (a<<10); \ c -= a; c -= b; c ^= (b>>15); \ } unsigned long jhash( SV* str ) { STRLEN rawlen; char* p; unsigned long a, b, c, len, length; /* extract the string data and string length from the perl scalar +*/ p = (char*)SvPV(str, rawlen); length = len = (unsigned long)rawlen; /* Test for undef or null string case and return 0 */ if ( length == 0 ) { DEBUG && printf( "Recieved a null or undef string!\n" ); return 0; } DEBUG && printf( "Received string '%.*s'.\n", (int)len, p ); a = b = 0x9e3779b9; /* golden ratio suggested by Jenkins */ c = 0; while (len >= 12) { a += (p[0] + (((unsigned long)p[1])<<8) + (((unsigned long)p[2 +])<<16) + (((unsigned long)p[3])<<24)); b += (p[4] + (((unsigned long)p[5])<<8) + (((unsigned long)p[6 +])<<16) + (((unsigned long)p[7])<<24)); c += (p[8] + (((unsigned long)p[9])<<8) + (((unsigned long)p[1 +0])<<16) + (((unsigned long)p[11])<<24)); MIX(a, b, c); p += 12; len -= 12; } c += length; switch(len) { case 11: c+=((unsigned int)p[10]<<24); case 10: c+=((unsigned int)p[9]<<16); case 9: c+=((unsigned int)p[8]<<8); case 8: b+=((unsigned int)p[7]<<24); case 7: b+=((unsigned int)p[6]<<16); case 6: b+=((unsigned int)p[5]<<8); case 5: b+=((unsigned int)p[4]); case 4: a+=((unsigned int)p[3]<<24); case 3: a+=((unsigned int)p[2]<<16); case 2: a+=((unsigned int)p[1]<<8); case 1: a+=((unsigned int)p[0]); } MIX(a, b, c); DEBUG && printf( "Hash value is %d.\n", (int)(c) ); return(c); } MODULE = Digest::JHash PACKAGE = Digest::JHash unsigned long jhash(str) SV* str # Digest::JHash.pm package Digest::JHash; use strict; require Exporter; require DynaLoader; our @ISA = qw(Exporter DynaLoader); our @EXPORT_OK = qw( jhash ); our $VERSION = '0.02'; bootstrap Digest::JHash $VERSION; 1; =head1 NAME Digest::JHash - Perl extension for JHash Hashing Algoritm =head1 SYNOPSIS use Digest::JHash qw(jhash); $digest = jhash($data); # note that calling jhash() directly like this is the fastest way: $digest = Digest::JHash::jhash($data); =head1 DESCRIPTION The C<Digest::JHash> module allows you to use the fast JHash hashing a +lgorithm developed by Bob Jenkins from within Perl programs. The algorithm take +s as input a message of arbitrary length and produces as output a 32-bit "message digest" of the input in the form of an unsigned long integer. Call it a low calorie version of MD5 if you like. See http://burtleburtle.net/bob/hash/doobs.html for more information. =head1 FUNCTIONS =over 4 =item jhash($data) This function will calculate the JHash digest of the "message" in $dat +a and return a 32 bit integer result (an unsigned long in the C) =back =head1 EXPORTS None by default but you can have the jhash() function if you ask nicel +y. See below for reasons not to use Exporter (it is slower than a direct +call) =head1 SPEED NOTE If speed is a major issue it is roughly twice as fast to do call the j +hash() function like Digest::JHash::jhash('message') than it is to import the jhash() method using Exporter so you can call it as simply jhash('mess +age'). There is a short script that demonstrates the speed of differecnt call +ing methods (direct, OO and Imported) in misc/oo_vs_func.pl =head1 AUTHORS The JHash implementation was written by Bob Jenkins (C<bob_jenkins@burtleburtle.net>). This perl extension was written by Andrew Towers (C<mariofrog@bigpond.com>). A few mods were added by James Freeman (C<james.freeman@id3.org.uk>). =head1 SEE ALSO http://burtleburtle.net/bob/hash/doobs.html =cut

    cheers

    tachyon

      Thanks tachyon, that's great.

      I did compile the original C code as a C program and use it for comparisons with the perl version when I was working on this, though that was some time ago. The C was obviously much faster.

      Unfortunately, despite my best efforts, I still live in the "nothing I must compile myself" perl world.

      I do have a version of 5.8.1 that I compiled with Borland that works, but every time I've tried to use modules that require compilation in conjunction with that build, I spend hours or days trying to hack the build into working. I'm ashamed to say that I gave up and went back to being a AS dependant.

      Thanks to the number of kind folks around that make binary versions of those modules that AS haven't gotten around to yet, I rarely encounter a module that I cannot install and use in short order, given the appropriate googling.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        To compile under AS Perl all you really need is M$ VCC of some variety and A Practical Guide to Compiling C based Modules under ActiveState using Microsoft C++. You just emulate the AS build environment (aka copy all the missing M$ headers etc into Perl/lib/CORE), fixup the environment by running VSVARS32.BAT (Last seen in C:\Program Files\Microsoft Visual Studio .NET\Common7\Tools but you know M$ ;-), occasionally have to copy a DLL into System32 for reasons that totally elude me (and I have never bothered to work out ;-) and nmake and CL.EXE will happily compile all sorts of 'basic' XS stuff ie any vanilla C generally works.

        Now I know VCC costs money but it does make Perling on Win32 using Active State Perl a helluva lot easier. You only need an old copy - but you can pick it up new on ebay for between $50-100 US. There were 3 X Visual C++.NET sealed in box for 50 quid buy now a few minutes ago.

        cheers

        tachyon