Re^2: Fast algorithm for 2d array queries

Fair enough

"But you fail to say how big the main array is? "

not that big, the size of the main array is 3000. n is 500-1000 (the number of inner arrays from which i need to pick)

"Or what you need to look up. Ie. what do you start with; and what information do you need to end up with?"

i start with my query array which is 500-1000 different ints between 0 and 2999 and i need to end up with the int value from inner arrays that is the most prevalent on those picked 500-1000 arrays. (intersect that has the highest number of elements)

"Or how many lookups you need to do? "

about 15000000 per hour.

Comment on Re^2: Fast algorithm for 2d array queries

Replies are listed 'Best First'.
Re^3: Fast algorithm for 2d array queries (Got a cluster?) by BrowserUk (Patriarch) on Feb 07, 2014 at 11:25 UTC
"Or how many lookups you need to do? " about 15000000 per hour. That's 4000 per second or 1 every 1/4 of a millisecond. In that time you want to intersect 500 to 1000 (from 3,000) sets of 50,000 integers and extract the single most populous integer across them all. Do you have a 3000 machine cluster available to throw at this problem? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: Fast algorithm for 2d array queries (Got OpenGL::Array) by Anonymous Monk on Feb 07, 2014 at 12:13 UTC
Hmm, https://metacpan.org/pod/OpenGL::Array#Calc:-POPULATING-AND-MANIPULATING-OpenGL::Array-OBJECTS might speed it up ? somehow :D	[reply]
Re^5: Fast algorithm for 2d array queries (Got OpenGL::Array) by BrowserUk (Patriarch) on Feb 07, 2014 at 12:36 UTC
Hmm, [ some XXX module ] might speed it up ? At the very minimum, the code would need to increment an integer (one of many) , 500 * 50000 times. That takes at least 1.6 seconds in Perl: `$t=time; $i=0; ++$i for 1 .. 25e6; print time-$t;; 1.66395616531372` [download] That 7000 times too slow for the OPs target rate. And that's before you take into account indexing into the arrays; indexing into the counts; and scanning the final tallies to determine which count is highest. This is impossible using a single machine, even if written in highly optimised C or assembler. It might just be possible if you can throw enough hardware at the problem such that you could pre-index every possible combination. Maybe. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^6: Fast algorithm for 2d array queries (GPU OpenGL::Array) by Anonymous Monk on Feb 07, 2014 at 21:19 UTC
Re^7: Fast algorithm for 2d array queries (GPU OpenGL::Array) by BrowserUk (Patriarch) on Feb 07, 2014 at 22:27 UTC