in reply to Re^2: Heap structure for lookup?
in thread Heap structure for lookup?
This means (binary) trees generally are out question, correct?
Looking into a different direction, is there any structure on your values that might be leveraged? Or are they more or less random 64bit integers?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Heap structure for lookup?
by BrowserUk (Patriarch) on May 27, 2015 at 09:28 UTC | |
This means (binary) trees generally are out question, correct? Certainly any CPAN modules that store generic perl scalars. A custom implementation that stored just a U64 value; giving 24 * 150e6 ~ 3.5GB would be possible. But it's the need for 2 64-bit pointers that is the main problem, hence my mind drifting to heaps. They self order as you insert, and avoid the pointers. But are any of the variations O(logN) searchable? is there any structure on your values that might be leveraged? Or are they more or less random 64bit integers? No. They can take on any of the 2^64 values and are likely to be distributed right across the range. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
| [reply] |
by hdb (Monsignor) on May 27, 2015 at 10:44 UTC | |
If you know the number of values upfront (or an upper barrier to the number) and if the values are roughly equally distributed across the full range (with some clustering of course), you could pre-allocate an array and then come up with a guess where a particular value belongs (the value divided by 2**64 or some algorithm based on the binary representation of that value). Then do some local sorting. If you made the array somewhat larger than needed you would have less clashes. | [reply] |
by BrowserUk (Patriarch) on May 27, 2015 at 11:42 UTC | |
What I've come up with is a very simple & compact hash implementation:
The insertion rate of 150e6 (into a fixed-size array of just over 200e6) is less that 1/4 microsecond per; or 37 second for them all; which is fine. If I push the number of inserts to 190e6, you can visibly see -- watching the trace of every 1e6th insertion -- the insertion rate slow down; but still the insertion rate is less than 1/2 a microsecond per. The lookups, currently done manually by feeding some of the values traced out back in to check they are found, and a few off-the-top-of-my-head other numbers to check they aren't, and the lookup time is around 20 microseconds per, including the perl->XS->perl transitions and conversions. Which is also very acceptable:
All together, I'm really surprised that such a simple hash implementation should work quite so well; even allowing for a 25% to 10% headroom in the array. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
| [reply] [d/l] [select] |