BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:
In Re^6: Heap structure for lookup?, I posted the bare bones of a very lightweight hash algorithm that is to be used for lookup purposes only.
The basis of that algorithm is (I think this is the correct mathematical term) co-primality.
Size the underlying array to that of a prime number somewhat bigger than the number of elements you want to store in the hash -- I chose 200,000,033 for my tests.
To insert an element, calculate the hash function -- or in my case, use the value of the 64-bit integer as the hash --, calculate the insertion point as i = hash % PRIME;, and attempt the insertion.
If that slot is already occupied, add the prime number to the previously calculated insertion point and again take the result mod PRIME: i = ( i + PRIME ) % PRIME; and try again.
This simple mechanism has the rather nice property that it will eventually try every possible location before returning to the starting point -- which if it happened would indicate that the array is completely full.
A second nice property is that the steps by which it jumps around the array are different for every initial insertion point; thus (probabilistically) avoiding excessive clustering.
To demonstrate with small numbers. An array size 17:
ReTry: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 First 0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2: 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15 0 2 3: 6 9 12 15 1 4 7 10 13 16 2 5 8 11 14 0 3 4: 8 12 16 3 7 11 15 2 6 10 14 1 5 9 13 0 4 5: 10 15 3 8 13 1 6 11 16 4 9 14 2 7 12 0 5 6: 12 1 7 13 2 8 14 3 9 15 4 10 16 5 11 0 6 7: 14 4 11 1 8 15 5 12 2 9 16 6 13 3 10 0 7 8: 16 7 15 6 14 5 13 4 12 3 11 2 10 1 9 0 8 9: 1 10 2 11 3 12 4 13 5 14 6 15 7 16 8 0 9 10: 3 13 6 16 9 2 12 5 15 8 1 11 4 14 7 0 10 11: 5 16 10 4 15 9 3 14 8 2 13 7 1 12 6 0 11 12: 7 2 14 9 4 16 11 6 1 13 8 3 15 10 5 0 12 13: 9 5 1 14 10 6 2 15 11 7 3 16 12 8 4 0 13 14: 11 8 5 2 16 13 10 7 4 1 15 12 9 6 3 0 14 15: 13 11 9 7 5 3 1 16 14 12 10 8 6 4 2 0 15 16: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 16
The flaw with that is the first row. If the (initial_hash_value % PRIME ) = 0;, then the scheme falls apart and the second and all subsequent hash%prime == 0 values can never be stored.
The code I posted incorporated an iteration count (c), that defended against an infinite loop; but it doesn't address the problem.
My first thought was to size the array 1 larger than the chosen prime, and add 1 to the mod'd value; thus the zeroth element is never used; but that just results in an infinite loop of 1s rather than 0s.
Is there a trivial (or not so) solution to the problem that I'm missing?
Are there any other problems I'm missing?
|
|---|