comment on

We have this hashing function (Jenkins Hash) as an XS extension, yes you have to compile it (we have compiled on Win32 and Linux without issues) but the result is of course much faster. You can get a copy from http://tachyon.perlmonk.org/Digest-JHash-0.02.tar.gz

We did a fair bit of dispersion testing on it and found it to be pretty damn good using URLs as the input data which is what we wanted it for. The dispersion peaks at somewhere around 90%+ of slots from memory.

Maybe we should post Digest::JHash and Digest::JHahs::PurePerl to CPAN?. Here it is as XS (Digest::JHash.XS):

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

/* Jenkins Hash http://burtleburtle.net/bob/hash/doobs.html */

const int DEBUG = 0;

#define MIX(a,b,c) \
{ \
  a -= b; a -= c; a ^= (c>>13); \
  b -= c; b -= a; b ^= (a<<8); \
  c -= a; c -= b; c ^= (b>>13); \
  a -= b; a -= c; a ^= (c>>12);  \
  b -= c; b -= a; b ^= (a<<16); \
  c -= a; c -= b; c ^= (b>>5); \
  a -= b; a -= c; a ^= (c>>3);  \
  b -= c; b -= a; b ^= (a<<10); \
  c -= a; c -= b; c ^= (b>>15); \
}

unsigned long jhash( SV* str )
{
    STRLEN rawlen;
    char* p;
    unsigned long a, b, c, len, length;

    /* extract the string data and string length from the perl scalar 
+*/
    p = (char*)SvPV(str, rawlen);
    length = len = (unsigned long)rawlen;

    /* Test for undef or null string case and return 0 */
    if ( length == 0 ) {
        DEBUG && printf( "Recieved a null or undef string!\n" );
      return 0;
    }

    DEBUG && printf( "Received string '%.*s'.\n", (int)len, p );

    a = b = 0x9e3779b9;        /* golden ratio suggested by Jenkins */
    c = 0;
    while (len >= 12)
    {
        a += (p[0] + (((unsigned long)p[1])<<8) + (((unsigned long)p[2
+])<<16) +
              (((unsigned long)p[3])<<24));
        b += (p[4] + (((unsigned long)p[5])<<8) + (((unsigned long)p[6
+])<<16) +
              (((unsigned long)p[7])<<24));
        c += (p[8] + (((unsigned long)p[9])<<8) + (((unsigned long)p[1
+0])<<16) +
              (((unsigned long)p[11])<<24));
        MIX(a, b, c);
        p += 12;
        len -= 12;
    }
    c += length;
    switch(len) {
    case 11: c+=((unsigned int)p[10]<<24);
    case 10: c+=((unsigned int)p[9]<<16);
    case 9:  c+=((unsigned int)p[8]<<8);
    case 8:  b+=((unsigned int)p[7]<<24);
    case 7:  b+=((unsigned int)p[6]<<16);
    case 6:  b+=((unsigned int)p[5]<<8);
    case 5:  b+=((unsigned int)p[4]);
    case 4:  a+=((unsigned int)p[3]<<24);
    case 3:  a+=((unsigned int)p[2]<<16);
    case 2:  a+=((unsigned int)p[1]<<8);
    case 1:  a+=((unsigned int)p[0]);
    }
    MIX(a, b, c);

    DEBUG && printf( "Hash value is %d.\n", (int)(c) );

    return(c);
}

MODULE = Digest::JHash        PACKAGE = Digest::JHash

unsigned long
jhash(str)
    SV*    str




# Digest::JHash.pm


package Digest::JHash;

use strict;

require Exporter;
require DynaLoader;

our @ISA = qw(Exporter DynaLoader);

our @EXPORT_OK = qw( jhash );
our $VERSION   = '0.02';

bootstrap Digest::JHash $VERSION;

1;

=head1 NAME

Digest::JHash - Perl extension for JHash Hashing Algoritm

=head1 SYNOPSIS

    use Digest::JHash qw(jhash);

    $digest = jhash($data);

    # note that calling jhash() directly like this is the fastest way:

    $digest = Digest::JHash::jhash($data);

=head1 DESCRIPTION

The C<Digest::JHash> module allows you to use the fast JHash hashing a
+lgorithm
developed by Bob Jenkins from within Perl programs. The algorithm take
+s as
input a message of arbitrary length and produces as output a 32-bit
"message digest" of the input in the form of an unsigned long integer.

Call it a low calorie version of MD5 if you like.

See http://burtleburtle.net/bob/hash/doobs.html for more information.

=head1 FUNCTIONS

=over 4

=item jhash($data)

This function will calculate the JHash digest of the "message" in $dat
+a
and return a 32 bit integer result (an unsigned long in the C)

=back

=head1 EXPORTS

None by default but you can have the jhash() function if you ask nicel
+y.
See below for reasons not to use Exporter (it is slower than a direct 
+call)

=head1 SPEED NOTE

If speed is a major issue it is roughly twice as fast to do call the j
+hash()
function like Digest::JHash::jhash('message') than it is to import the
jhash() method using Exporter so you can call it as simply jhash('mess
+age').
There is a short script that demonstrates the speed of differecnt call
+ing
methods (direct, OO and Imported) in misc/oo_vs_func.pl

=head1 AUTHORS

The JHash implementation was written by Bob Jenkins
(C<bob_jenkins@burtleburtle.net>).

This perl extension was written by Andrew Towers
(C<mariofrog@bigpond.com>).

A few mods were added by James Freeman
(C<james.freeman@id3.org.uk>).

=head1 SEE ALSO

http://burtleburtle.net/bob/hash/doobs.html

=cut
[download]

cheers

tachyon

In reply to Re: Re: Fast string hash in portable perl? by tachyon
in thread Fast string hash in portable perl? by jhanna

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.