If there is not a solid 32bit type then I was contemplating a macro that is a noop on 32 bit systems, or does & 0xfffffff on 64 bit systems.

I think simply doing & 0xffffffff on 64-bit systems (whenever a value larger than 0xffffffff could be the result of an individual operation) is not such a bad idea... in case unsigned long happens to be wider than 32-bit (which can be determined easily, either at build- or at runtime). At least I would guess it's less headaches getting it to work portably, than trying to make provisions to always use the appropriate int type, where the compiler is making sure that exactly 32-bit are being used.

The additional and-operations (53 of which are needed here) are typically very fast, so the associated performance penalty on 64-bit platforms is likely going to be acceptable.  In fact, I just benchmarked it. With the test routines computing the jhash for 100 random strings of length 100000, I get on average:

Rate u32 u64 u32 54.9/s -- -13% u64 63.3/s 15% --

where 'u32' is the version with the additional & 0xffffffff instructions, producing correct results, while 'u64' is the original version producing incorrect results with 64-bit ints.

(Perl v5.8.8 built for x86_64-linux-thread-multi; gcc 4.1.0)

With a string length of only 100 chars, the difference reduces to about 4%, because the function calling overhead becomes larger, relatively... The ~14% is close to the 'true' slow down attributable to the additional and-operations — in other words, it's roughly the asymptotic limit, no longer increasing significantly with greater string lengths.

#!/usr/bin/perl use strict; use warnings; use Digest::JHash; use Benchmark qw(cmpthese); my @dat; for my $i (1..100) { $dat[$i] = ''; $dat[$i] .= chr(int(rand(256))) for 1..100000; } cmpthese (100, { u32 => sub { for (1..100) { my $h = Digest::JHash::jhash($dat[$_]); } }, u64 => sub { for (1..100) { my $h = Digest::JHash::jhash_orig($dat[$_]); } }, } );
#define MASK32 0xffffffff #define MIX(a,b,c) \ { \ a -= b; a &= MASK32; a -= c; a &= MASK32; a ^= ((c>>13)); \ b -= c; b &= MASK32; b -= a; b &= MASK32; b ^= ((a<<8) & MASK32); \ c -= a; c &= MASK32; c -= b; c &= MASK32; c ^= ((b>>13)); \ a -= b; a &= MASK32; a -= c; a &= MASK32; a ^= ((c>>12)); \ b -= c; b &= MASK32; b -= a; b &= MASK32; b ^= ((a<<16) & MASK32); \ c -= a; c &= MASK32; c -= b; c &= MASK32; c ^= ((b>>5) ); \ a -= b; a &= MASK32; a -= c; a &= MASK32; a ^= ((c>>3) ); \ b -= c; b &= MASK32; b -= a; b &= MASK32; b ^= ((a<<10) & MASK32); \ c -= a; c &= MASK32; c -= b; c &= MASK32; c ^= ((b>>15)); \ } unsigned long jhash( SV* str ) { STRLEN rawlen; char* p; unsigned long a, b, c, len, length; /* extract the string data and string length from the perl scalar +*/ p = (char*)SvPV(str, rawlen); length = len = (unsigned long)rawlen; /* Test for undef or null string case and return 0 */ if ( length == 0 ) { DEBUG && printf( "Recieved a null or undef string!\n" ); return 0; } DEBUG && printf( "Received string '%.*s'.\n", (int)len, p ); a = b = 0x9e3779b9; /* golden ratio suggested by Jenkins */ c = 0; while (len >= 12) { a += (p[0] + ((((unsigned long)p[1]) << 8) & MASK32 ) + ((((unsigned long)p[2]) << 16) & MASK32 ) + ((((unsigned long)p[3]) << 24) & MASK32 )); a &= +MASK32; b += (p[4] + ((((unsigned long)p[5]) << 8) & MASK32 ) + ((((unsigned long)p[6]) << 16) & MASK32 ) + ((((unsigned long)p[7]) << 24) & MASK32 )); b &= +MASK32; c += (p[8] + ((((unsigned long)p[9]) << 8) & MASK32 ) + ((((unsigned long)p[10])<< 16) & MASK32 ) + ((((unsigned long)p[11])<< 24) & MASK32 )); c &= +MASK32; MIX(a, b, c); p += 12; len -= 12; } c += length; switch(len) { case 11: c+=(((unsigned int)p[10]<< 24) & MASK32 ); c &= MASK32; case 10: c+=(((unsigned int)p[9] << 16) & MASK32 ); c &= MASK32; case 9: c+=(((unsigned int)p[8] << 8) & MASK32 ); c &= MASK32; case 8: b+=(((unsigned int)p[7] << 24) & MASK32 ); b &= MASK32; case 7: b+=(((unsigned int)p[6] << 16) & MASK32 ); b &= MASK32; case 6: b+=(((unsigned int)p[5] << 8) & MASK32 ); b &= MASK32; case 5: b+= ((unsigned int)p[4]); b &= MASK32; case 4: a+=(((unsigned int)p[3] << 24) & MASK32 ); a &= MASK32; case 3: a+=(((unsigned int)p[2] << 16) & MASK32 ); a &= MASK32; case 2: a+=(((unsigned int)p[1] << 8) & MASK32 ); a &= MASK32; case 1: a+= ((unsigned int)p[0]); a &= MASK32; } MIX(a, b, c); DEBUG && printf( "Hash value is %d.\n", (int)(c) ); return(c); }

In reply to Re: Perl XS portable uint32_t by almut
in thread Perl XS portable uint32_t by tachyon-II

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.