in reply to Re: Fast string hash in portable perl? [DO NOT USE!]
in thread Fast string hash in portable perl?

We have this hashing function (Jenkins Hash) as an XS extension, yes you have to compile it (we have compiled on Win32 and Linux without issues) but the result is of course much faster. You can get a copy from http://tachyon.perlmonk.org/Digest-JHash-0.02.tar.gz

We did a fair bit of dispersion testing on it and found it to be pretty damn good using URLs as the input data which is what we wanted it for. The dispersion peaks at somewhere around 90%+ of slots from memory.

Maybe we should post Digest::JHash and Digest::JHahs::PurePerl to CPAN?. Here it is as XS (Digest::JHash.XS):

#include "EXTERN.h" #include "perl.h" #include "XSUB.h" /* Jenkins Hash http://burtleburtle.net/bob/hash/doobs.html */ const int DEBUG = 0; #define MIX(a,b,c) \ { \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<<8); \ c -= a; c -= b; c ^= (b>>13); \ a -= b; a -= c; a ^= (c>>12); \ b -= c; b -= a; b ^= (a<<16); \ c -= a; c -= b; c ^= (b>>5); \ a -= b; a -= c; a ^= (c>>3); \ b -= c; b -= a; b ^= (a<<10); \ c -= a; c -= b; c ^= (b>>15); \ } unsigned long jhash( SV* str ) { STRLEN rawlen; char* p; unsigned long a, b, c, len, length; /* extract the string data and string length from the perl scalar +*/ p = (char*)SvPV(str, rawlen); length = len = (unsigned long)rawlen; /* Test for undef or null string case and return 0 */ if ( length == 0 ) { DEBUG && printf( "Recieved a null or undef string!\n" ); return 0; } DEBUG && printf( "Received string '%.*s'.\n", (int)len, p ); a = b = 0x9e3779b9; /* golden ratio suggested by Jenkins */ c = 0; while (len >= 12) { a += (p[0] + (((unsigned long)p[1])<<8) + (((unsigned long)p[2 +])<<16) + (((unsigned long)p[3])<<24)); b += (p[4] + (((unsigned long)p[5])<<8) + (((unsigned long)p[6 +])<<16) + (((unsigned long)p[7])<<24)); c += (p[8] + (((unsigned long)p[9])<<8) + (((unsigned long)p[1 +0])<<16) + (((unsigned long)p[11])<<24)); MIX(a, b, c); p += 12; len -= 12; } c += length; switch(len) { case 11: c+=((unsigned int)p[10]<<24); case 10: c+=((unsigned int)p[9]<<16); case 9: c+=((unsigned int)p[8]<<8); case 8: b+=((unsigned int)p[7]<<24); case 7: b+=((unsigned int)p[6]<<16); case 6: b+=((unsigned int)p[5]<<8); case 5: b+=((unsigned int)p[4]); case 4: a+=((unsigned int)p[3]<<24); case 3: a+=((unsigned int)p[2]<<16); case 2: a+=((unsigned int)p[1]<<8); case 1: a+=((unsigned int)p[0]); } MIX(a, b, c); DEBUG && printf( "Hash value is %d.\n", (int)(c) ); return(c); } MODULE = Digest::JHash PACKAGE = Digest::JHash unsigned long jhash(str) SV* str # Digest::JHash.pm package Digest::JHash; use strict; require Exporter; require DynaLoader; our @ISA = qw(Exporter DynaLoader); our @EXPORT_OK = qw( jhash ); our $VERSION = '0.02'; bootstrap Digest::JHash $VERSION; 1; =head1 NAME Digest::JHash - Perl extension for JHash Hashing Algoritm =head1 SYNOPSIS use Digest::JHash qw(jhash); $digest = jhash($data); # note that calling jhash() directly like this is the fastest way: $digest = Digest::JHash::jhash($data); =head1 DESCRIPTION The C<Digest::JHash> module allows you to use the fast JHash hashing a +lgorithm developed by Bob Jenkins from within Perl programs. The algorithm take +s as input a message of arbitrary length and produces as output a 32-bit "message digest" of the input in the form of an unsigned long integer. Call it a low calorie version of MD5 if you like. See http://burtleburtle.net/bob/hash/doobs.html for more information. =head1 FUNCTIONS =over 4 =item jhash($data) This function will calculate the JHash digest of the "message" in $dat +a and return a 32 bit integer result (an unsigned long in the C) =back =head1 EXPORTS None by default but you can have the jhash() function if you ask nicel +y. See below for reasons not to use Exporter (it is slower than a direct +call) =head1 SPEED NOTE If speed is a major issue it is roughly twice as fast to do call the j +hash() function like Digest::JHash::jhash('message') than it is to import the jhash() method using Exporter so you can call it as simply jhash('mess +age'). There is a short script that demonstrates the speed of differecnt call +ing methods (direct, OO and Imported) in misc/oo_vs_func.pl =head1 AUTHORS The JHash implementation was written by Bob Jenkins (C<bob_jenkins@burtleburtle.net>). This perl extension was written by Andrew Towers (C<mariofrog@bigpond.com>). A few mods were added by James Freeman (C<james.freeman@id3.org.uk>). =head1 SEE ALSO http://burtleburtle.net/bob/hash/doobs.html =cut

cheers

tachyon

Replies are listed 'Best First'.
Re: Re: Re: Fast string hash in portable perl?
by BrowserUk (Patriarch) on Dec 20, 2003 at 04:43 UTC

    Thanks tachyon, that's great.

    I did compile the original C code as a C program and use it for comparisons with the perl version when I was working on this, though that was some time ago. The C was obviously much faster.

    Unfortunately, despite my best efforts, I still live in the "nothing I must compile myself" perl world.

    I do have a version of 5.8.1 that I compiled with Borland that works, but every time I've tried to use modules that require compilation in conjunction with that build, I spend hours or days trying to hack the build into working. I'm ashamed to say that I gave up and went back to being a AS dependant.

    Thanks to the number of kind folks around that make binary versions of those modules that AS haven't gotten around to yet, I rarely encounter a module that I cannot install and use in short order, given the appropriate googling.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      To compile under AS Perl all you really need is M$ VCC of some variety and A Practical Guide to Compiling C based Modules under ActiveState using Microsoft C++. You just emulate the AS build environment (aka copy all the missing M$ headers etc into Perl/lib/CORE), fixup the environment by running VSVARS32.BAT (Last seen in C:\Program Files\Microsoft Visual Studio .NET\Common7\Tools but you know M$ ;-), occasionally have to copy a DLL into System32 for reasons that totally elude me (and I have never bothered to work out ;-) and nmake and CL.EXE will happily compile all sorts of 'basic' XS stuff ie any vanilla C generally works.

      Now I know VCC costs money but it does make Perling on Win32 using Active State Perl a helluva lot easier. You only need an old copy - but you can pick it up new on ebay for between $50-100 US. There were 3 X Visual C++.NET sealed in box for 50 quid buy now a few minutes ago.

      cheers

      tachyon

        <sarcasm>Oh, is that all?</sarcasm>

        -sauoq
        "My two cents aren't worth a dime.";