in reply to Re: Fast algorithm for 2d array queries
in thread Fast algorithm for 2d array queries

Fair enough

"But you fail to say how big the main array is? "

not that big, the size of the main array is 3000. n is 500-1000 (the number of inner arrays from which i need to pick)

"Or what you need to look up. Ie. what do you start with; and what information do you need to end up with?"

i start with my query array which is 500-1000 different ints between 0 and 2999 and i need to end up with the int value from inner arrays that is the most prevalent on those picked 500-1000 arrays. (intersect that has the highest number of elements)

"Or how many lookups you need to do? "

about 15000000 per hour.

  • Comment on Re^2: Fast algorithm for 2d array queries

Replies are listed 'Best First'.
Re^3: Fast algorithm for 2d array queries (Got a cluster?)
by BrowserUk (Patriarch) on Feb 07, 2014 at 11:25 UTC
    "Or how many lookups you need to do? " about 15000000 per hour.

    That's 4000 per second or 1 every 1/4 of a millisecond.

    In that time you want to intersect 500 to 1000 (from 3,000) sets of 50,000 integers and extract the single most populous integer across them all.

    Do you have a 3000 machine cluster available to throw at this problem?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
        Hmm, [ some XXX module ] might speed it up ?

        At the very minimum, the code would need to increment an integer (one of many) , 500 * 50000 times. That takes at least 1.6 seconds in Perl:

        $t=time; $i=0; ++$i for 1 .. 25e6; print time-$t;; 1.66395616531372

        That 7000 times too slow for the OPs target rate.

        And that's before you take into account indexing into the arrays; indexing into the counts; and scanning the final tallies to determine which count is highest.

        This is impossible using a single machine, even if written in highly optimised C or assembler.

        It might just be possible if you can throw enough hardware at the problem such that you could pre-index every possible combination. Maybe.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.