If the number of discrete values that can appear in your small arrays is less than 32 -- they do not have to be contiguous -- then you can represent each array by a 32-bit number, with each bit representing one of the numbers. This reduces the space required for each million arrays of up to 32 numbers from around 900MB to 4MB.

...101001010010001 ||||| ||||+- 1 |||+-- 16 |||+-- 75 ||+--- 204 |+---- 205 +----- 206 etc.

This also allows the 528 possible pairs to each be represented by a single, 32-bit number with 2-bits set.

Counting the pairs becomes a process of iterating the 'arrays' and masking them against the pairs, stored as keys in a hash,and incrementing the values.

Finding the top three counts in the resultant hash is easy. This process reduces the problem to O(number of arrays), and on my machine allows them to count the pairs at a rate of a little over 1 second per thousand, or around fifteen minutes/1 million with a storage requirement of 4MB per million + plus startup.

#! perl -slw use strict; use List::Util qw[ max ]; use Benchmark::Timer; my $T = new Benchmark::Timer; our $ARRAYS ||= 10; our $K ||= 3; # Generate all the possible combinations of pairs # And create a hash with them as the keys. my %pairs; for my $n ( 0 .. 19 ){ $pairs{ 1<<$n | 2<<$_ } = 0 for $n .. 19; }; # Generate an 'array' of 1 million 'arrays' with up to 20 # numbers in each. my $arrays = ''; vec( $arrays, $_, 32 ) = int rand( 1 << 19 ) for 1 .. $ARRAYS; # Count the pairs $T->start( 'Count the pairs' ); for my $pair ( keys %pairs ) { for ( 0 .. $ARRAYS-1 ) { $pairs{ $pair }++ if ( vec( $arrays, $_, 32 ) & $pair ) == $pa +ir; } } $T->stop( 'Count the pairs' ); # Find the top K pairs $T->start( 'Find the top K' ); my @topK = ( keys %pairs )[ 1 .. 3 ]; for my $pair ( keys %pairs ) { if( $pairs{ $pair } > max( @pairs{ @topK } ) ) { push @topK, $pair; shift @topK if @topK > 3; } } $T->stop( 'Find the top K' ); # Display the results for my $k ( @topK ) { printf 'Pair [ %d, %d ] ', grep{ $k & (1<<$_) ? $_ : () } 0 .. 19; printf 'appeared %d times'. $/, $pairs{ $k }; } $T->report; print 'Record memory usage';<stdin>; __END__ P:\test>305470 -ARRAYS=10000 -K=3 Pair [ 12, 14 ] appeared 2521 times Pair [ 4, 17 ] appeared 2579 times Pair [ 7, 17 ] appeared 2582 times 1 trial of Count the pairs (10.315s total) 1 trial of Find the top K (10.014ms total) Record memory usage 2772k P:\test>305470 -ARRAYS=100000 -K=3 Pair [ 4, 17 ] appeared 25165 times Pair [ 7, 16 ] appeared 25190 times Pair [ 7, 11 ] appeared 25348 times 1 trial of Count the pairs (106.954s total) 1 trial of Find the top K (10.015ms total) Record memory usage 3124k P:\test>305470 -ARRAYS=200000 -K=3 Pair [ 9, 17 ] appeared 50261 times Pair [ 7, 13 ] appeared 50356 times Pair [ 9, 13 ] appeared 50386 times 1 trial of Count the pairs (213.307s total) 1 trial of Find the top K (10.014ms total) Record memory usage 3524k

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!


In reply to Re: Best Pairs (O(n) 15mins/million if ...) by BrowserUk
in thread Best Pairs by artist

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.