What I've come up with is a very simple & compact hash implementation: #! perl -slw
use strict;
use Inline C => Config => BUILD_NOISY => 1;
use Inline C => <<'END_C', NAME => 'Lookup64', CLEAN_AFTER_BUILD =>0,
+ TYPEMAPS => './lookup64.typemap';
#include "C\mytypes.h"
#include "C\rand64.h"
#include <time.h>
#define N (200000033)
U64 *a;
void build( int M ) {
U32 i;
a = calloc( N, sizeof( U64 ) );
srand64( __rdtsc(), _time64( NULL ) );
for( i=0; i < M; ++i ) {
U64 r = rand64();
U32 modr = r % N, n = modr, c = 0;
while( a[n] && ++c < N ) {
n = ( n + modr ) % N;
}
a[ n ] = r;
if( ( i % 1000000 ) == 0 ) printf( "%I64u @ %u\n", r, n );
}
}
int lookup( U64 v ) {
U32 modr = v % N, n = modr, c = 0;
while( a[n] != v && a[n] != 0 && ++c < N ) {
n = ( n + modr ) % N;
}
if( a[n] != v ) return 0;
return n;
}
END_C
package main;
use Time::HiRes qw[ time ];
our $N //= 1000;
<STDIN>;
my $start = time;
build( $N );
my $end = time;
printf( "Inserting $N U64s took %.9f seconds\n", $end-$start );
$start = time;
while( my $v = <STDIN> ) {
chomp( $v );
$start = time;
my $p = lookup( $v );
$end = time;
printf "%u @ %u in %.9f s\n", $v, $p, $end - $start;
}
The insertion rate of 150e6 (into a fixed-size array of just over 200e6) is less that 1/4 microsecond per; or 37 second for them all; which is fine.
If I push the number of inserts to 190e6, you can visibly see -- watching the trace of every 1e6th insertion -- the insertion rate slow down; but still the insertion rate is less than 1/2 a microsecond per.
The lookups, currently done manually by feeding some of the values traced out back in to check they are found, and a few off-the-top-of-my-head other numbers to check they aren't, and the lookup time is around 20 microseconds per, including the perl->XS->perl transitions and conversions. Which is also very acceptable:
C:\test>lookup64 -N=150e6
16536075677023473182 @ 171436696
...
6917323807836132563 @ 155785214
4887574662696880337 @ 47194064
9190776599876911997 @ 197069741
5378457539553008893 @ 15322593
6479208648984222734 @ 29944101
7965142717433510515 @ 185179020
1980553766129900057 @ 1573360
4821546746934524968 @ 179443020
Inserting 150e6 U64s took 38.428204060 seconds
4821546746934524968
4821546746934524968 @ 179443020 in 0.000000000 s
1980553766129900057
1980553766129900057 @ 1573360 in 0.000020027 s
6479208648984222734
6479208648984222734 @ 29944101 in 0.000023127 s
4887574662696880337
4887574662696880337 @ 47194064 in 0.000015020 s
1
1 @ 0 in 0.000015974 s
5
5 @ 0 in 0.000000000 s
12345
12345 @ 0 in 0.000018120 s
1234567890987654321
1234567890987654321 @ 0 in 0.000018835 s
16536075677023473182
16536075677023473182 @ 171436696 in 0.000000000 s
All together, I'm really surprised that such a simple hash implementation should work quite so well; even allowing for a 25% to 10% headroom in the array.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] [d/l] [select] |