in reply to Heap structure for lookup?

From the old days I remember the term red black tree for such search operations and a look up on CPAN says there are modules for it. Not sure whether this suits your requiement.

Replies are listed 'Best First'.
Re^2: Heap structure for lookup?
by BrowserUk (Patriarch) on May 27, 2015 at 08:43 UTC

    The problem is that Tree::RedBlack is implemented in pure Perl, using a blessed hash for each node.

    Without having tried it, I estimate that a tree to hold my 150e6 values would require ~40GB of ram.

    I haven't found an XS implementation, but even then it would require at least 2x64-bit pointers + a 64-bit pointer to an SvUV + 1-bit per node. If they stored the R/B in an unused bit in one of the pointers, then that's 56 bytes * 150e6 = ~8GB, which is too much.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      This means (binary) trees generally are out question, correct?

      Looking into a different direction, is there any structure on your values that might be leveraged? Or are they more or less random 64bit integers?

        This means (binary) trees generally are out question, correct?

        Certainly any CPAN modules that store generic perl scalars. A custom implementation that stored just a U64 value; giving 24 * 150e6 ~ 3.5GB would be possible.

        But it's the need for 2 64-bit pointers that is the main problem, hence my mind drifting to heaps. They self order as you insert, and avoid the pointers.

        But are any of the variations O(logN) searchable?

        is there any structure on your values that might be leveraged? Or are they more or less random 64bit integers?

        No. They can take on any of the 2^64 values and are likely to be distributed right across the range.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked