This means (binary) trees generally are out question, correct?
Certainly any CPAN modules that store generic perl scalars. A custom implementation that stored just a U64 value; giving 24 * 150e6 ~ 3.5GB would be possible.
But it's the need for 2 64-bit pointers that is the main problem, hence my mind drifting to heaps. They self order as you insert, and avoid the pointers.
But are any of the variations O(logN) searchable?
is there any structure on your values that might be leveraged? Or are they more or less random 64bit integers?
No. They can take on any of the 2^64 values and are likely to be distributed right across the range.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] |
If you know the number of values upfront (or an upper barrier to the number) and if the values are roughly equally distributed across the full range (with some clustering of course), you could pre-allocate an array and then come up with a guess where a particular value belongs (the value divided by 2**64 or some algorithm based on the binary representation of that value). Then do some local sorting. If you made the array somewhat larger than needed you would have less clashes.
| [reply] |
What I've come up with is a very simple & compact hash implementation:
The insertion rate of 150e6 (into a fixed-size array of just over 200e6) is less that 1/4 microsecond per; or 37 second for them all; which is fine.
If I push the number of inserts to 190e6, you can visibly see -- watching the trace of every 1e6th insertion -- the insertion rate slow down; but still the insertion rate is less than 1/2 a microsecond per.
The lookups, currently done manually by feeding some of the values traced out back in to check they are found, and a few off-the-top-of-my-head other numbers to check they aren't, and the lookup time is around 20 microseconds per, including the perl->XS->perl transitions and conversions. Which is also very acceptable:
C:\test>lookup64 -N=150e6
16536075677023473182 @ 171436696
...
6917323807836132563 @ 155785214
4887574662696880337 @ 47194064
9190776599876911997 @ 197069741
5378457539553008893 @ 15322593
6479208648984222734 @ 29944101
7965142717433510515 @ 185179020
1980553766129900057 @ 1573360
4821546746934524968 @ 179443020
Inserting 150e6 U64s took 38.428204060 seconds
4821546746934524968
4821546746934524968 @ 179443020 in 0.000000000 s
1980553766129900057
1980553766129900057 @ 1573360 in 0.000020027 s
6479208648984222734
6479208648984222734 @ 29944101 in 0.000023127 s
4887574662696880337
4887574662696880337 @ 47194064 in 0.000015020 s
1
1 @ 0 in 0.000015974 s
5
5 @ 0 in 0.000000000 s
12345
12345 @ 0 in 0.000018120 s
1234567890987654321
1234567890987654321 @ 0 in 0.000018835 s
16536075677023473182
16536075677023473182 @ 171436696 in 0.000000000 s
All together, I'm really surprised that such a simple hash implementation should work quite so well; even allowing for a 25% to 10% headroom in the array.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] [d/l] [select] |