in reply to Perl XS portable uint32_t
If there is not a solid 32bit type then I was contemplating a macro that is a noop on 32 bit systems, or does & 0xfffffff on 64 bit systems.
I think simply doing & 0xffffffff on 64-bit systems (whenever a value larger than 0xffffffff could be the result of an individual operation) is not such a bad idea... in case unsigned long happens to be wider than 32-bit (which can be determined easily, either at build- or at runtime). At least I would guess it's less headaches getting it to work portably, than trying to make provisions to always use the appropriate int type, where the compiler is making sure that exactly 32-bit are being used.
The additional and-operations (53 of which are needed here) are typically very fast, so the associated performance penalty on 64-bit platforms is likely going to be acceptable. In fact, I just benchmarked it. With the test routines computing the jhash for 100 random strings of length 100000, I get on average:
Rate u32 u64 u32 54.9/s -- -13% u64 63.3/s 15% --
where 'u32' is the version with the additional & 0xffffffff instructions, producing correct results, while 'u64' is the original version producing incorrect results with 64-bit ints.
(Perl v5.8.8 built for x86_64-linux-thread-multi; gcc 4.1.0)
With a string length of only 100 chars, the difference reduces to about 4%, because the function calling overhead becomes larger, relatively... The ~14% is close to the 'true' slow down attributable to the additional and-operations — in other words, it's roughly the asymptotic limit, no longer increasing significantly with greater string lengths.
#!/usr/bin/perl use strict; use warnings; use Digest::JHash; use Benchmark qw(cmpthese); my @dat; for my $i (1..100) { $dat[$i] = ''; $dat[$i] .= chr(int(rand(256))) for 1..100000; } cmpthese (100, { u32 => sub { for (1..100) { my $h = Digest::JHash::jhash($dat[$_]); } }, u64 => sub { for (1..100) { my $h = Digest::JHash::jhash_orig($dat[$_]); } }, } );
#define MASK32 0xffffffff #define MIX(a,b,c) \ { \ a -= b; a &= MASK32; a -= c; a &= MASK32; a ^= ((c>>13)); \ b -= c; b &= MASK32; b -= a; b &= MASK32; b ^= ((a<<8) & MASK32); \ c -= a; c &= MASK32; c -= b; c &= MASK32; c ^= ((b>>13)); \ a -= b; a &= MASK32; a -= c; a &= MASK32; a ^= ((c>>12)); \ b -= c; b &= MASK32; b -= a; b &= MASK32; b ^= ((a<<16) & MASK32); \ c -= a; c &= MASK32; c -= b; c &= MASK32; c ^= ((b>>5) ); \ a -= b; a &= MASK32; a -= c; a &= MASK32; a ^= ((c>>3) ); \ b -= c; b &= MASK32; b -= a; b &= MASK32; b ^= ((a<<10) & MASK32); \ c -= a; c &= MASK32; c -= b; c &= MASK32; c ^= ((b>>15)); \ } unsigned long jhash( SV* str ) { STRLEN rawlen; char* p; unsigned long a, b, c, len, length; /* extract the string data and string length from the perl scalar +*/ p = (char*)SvPV(str, rawlen); length = len = (unsigned long)rawlen; /* Test for undef or null string case and return 0 */ if ( length == 0 ) { DEBUG && printf( "Recieved a null or undef string!\n" ); return 0; } DEBUG && printf( "Received string '%.*s'.\n", (int)len, p ); a = b = 0x9e3779b9; /* golden ratio suggested by Jenkins */ c = 0; while (len >= 12) { a += (p[0] + ((((unsigned long)p[1]) << 8) & MASK32 ) + ((((unsigned long)p[2]) << 16) & MASK32 ) + ((((unsigned long)p[3]) << 24) & MASK32 )); a &= +MASK32; b += (p[4] + ((((unsigned long)p[5]) << 8) & MASK32 ) + ((((unsigned long)p[6]) << 16) & MASK32 ) + ((((unsigned long)p[7]) << 24) & MASK32 )); b &= +MASK32; c += (p[8] + ((((unsigned long)p[9]) << 8) & MASK32 ) + ((((unsigned long)p[10])<< 16) & MASK32 ) + ((((unsigned long)p[11])<< 24) & MASK32 )); c &= +MASK32; MIX(a, b, c); p += 12; len -= 12; } c += length; switch(len) { case 11: c+=(((unsigned int)p[10]<< 24) & MASK32 ); c &= MASK32; case 10: c+=(((unsigned int)p[9] << 16) & MASK32 ); c &= MASK32; case 9: c+=(((unsigned int)p[8] << 8) & MASK32 ); c &= MASK32; case 8: b+=(((unsigned int)p[7] << 24) & MASK32 ); b &= MASK32; case 7: b+=(((unsigned int)p[6] << 16) & MASK32 ); b &= MASK32; case 6: b+=(((unsigned int)p[5] << 8) & MASK32 ); b &= MASK32; case 5: b+= ((unsigned int)p[4]); b &= MASK32; case 4: a+=(((unsigned int)p[3] << 24) & MASK32 ); a &= MASK32; case 3: a+=(((unsigned int)p[2] << 16) & MASK32 ); a &= MASK32; case 2: a+=(((unsigned int)p[1] << 8) & MASK32 ); a &= MASK32; case 1: a+= ((unsigned int)p[0]); a &= MASK32; } MIX(a, b, c); DEBUG && printf( "Hash value is %d.\n", (int)(c) ); return(c); }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Perl XS portable uint32_t
by tachyon-II (Chaplain) on Jun 07, 2008 at 08:01 UTC | |
by almut (Canon) on Jun 07, 2008 at 10:27 UTC | |
by tachyon-II (Chaplain) on Jun 08, 2008 at 04:15 UTC | |
by syphilis (Archbishop) on Jun 07, 2008 at 23:07 UTC |