Whilst I agree that the algorithm is limited to numerics, (the triump of the specific--the OP's "positive integers"--over the generic), I disagree with your big O assessment.

It perfectly feasible to pre-allocate the bitstrings to their largest possible size (2^32/8 = 512MB each). Easily and relativly quickly achieved on a fully populated 32-bit machine. But more importantly, it only need be done once regardless of the size of the input arrays, so it is an O(2) constant factor, and is disregarded by big-O.

Of course, allocating huge chunks of ram for small input arrays means that the constant factor would be proportionally quite large, but that's the limitation of big-O.

Conversely, the hashes used by the other algorithms grow by a process of repeated reallocation in geometrically increasing chunks, which means the time cost grows geometrically with size of the input arrays. In addition, the grep solutions have to build stack-bound lists from each of the arrays.

So, if you are going to try and account for the memory allocation in assesment of the algorithms, then you would have to come up with something like:

O( M+N + SIGMA( 2^3+2^4+...+2^n>N) + SIGMA( 2^3+...+2^m >M ) + MIN(N,M +) ) memory for 1st hash memory for 2nd hash memory +for 1 array* ## (*assuming you chose the smallest array to use with the 2hash/1 lis +t method)

or

O( M+N + SIGMA( 2^3+2^4+...+2^n>N) + N + M) ) memory for smallest hash memory for the two arrays ## for the 1 hash/2 arrays method.

All of which reenforces my stance that big-O is rarely done properly outside of academic papaers, and is of little use unless it is. A benchmark is more instructive. This run uses arrays of 1 million apiece, with the numbers in the range 0 .. 2^27:

C:\test>unionIntersect.pl -N=1e6 -MAX=27 s/iter barvin roy buk buk2 barvin 8.58 -- -43% -58% -64% roy 4.86 77% -- -26% -37% buk 3.59 139% 35% -- -15% buk2 3.05 182% 60% 18% --

I couldn't go higher than 2^27 because of the need to have enough memory in the process to accommodate the large hashes and lists created by the other algorithms as well. Of course, you do have to go to quite large input arrays before that constant factor for allocating the bitstrings amortises:

C:\test>unionIntersect.pl -N=5e5 -MAX=27 s/iter barvin buk roy buk2 barvin 3.87 -- -28% -40% -50% buk 2.80 39% -- -17% -30% roy 2.33 66% 20% -- -16% buk2 1.95 98% 43% 19% --

BTW: Buk2 simply uses RoyJohnstone's trick of only counting the bits for one solution and deriving the other number with arithmetic:

sub buk2{ my( $aRef, $bRef ) = @_; my( $aBits, $bBits ); for( $aBits, $bBits ) { local $\; open my $ram, '>', \$_; seek $ram, $MAX-1, 0; print $ram chr(0); close $ram; } vec( $aBits, $_, 1 ) = 1 for @$aRef; vec( $bBits, $_, 1 ) = 1 for @$bRef; my $iCount = unpack '%32b*', ( $aBits & $bBits ); my $uCount = @$aRef + @$bRef - $iCount; $aBits = $bBits = ''; print 'buk2: ', $iCount, ' : ', $uCount if $SHOW; return( $iCount, $uCount ); }

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^3: Fast, Efficient Union and Intersection on arrays by BrowserUk
in thread Fast, Efficient Union and Intersection on arrays by barvin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.