Worst case memory use and running time should both scale with my algorithm like O(N*N*M).

Reverting to hashes instead of arrays (in order to accomodate non-integer values) has really hurt your algorithm. I upped the timeout from 1 to 5 minutes, but I still see many timeouts. And memory consumption often exceeds my 1.5GB causing perl to segfault on the 1e5 and 1e6 tests.

About a reproducible random shuffle, why not just call srand with some specific value

See my thread at Difference arrays.. My native rand is an abysmal 15-bit linear congruent PRNG, which besides being awful for any kind of statistical analysis, is different from your 48-bit generator. So while srand makes it reproducable on my system, it won't produce the same results on your system (nor the OPs). For that reason, I always use Math::Random::MT (which I checked *does* produce the same output for any given seeding across systems), for anything that requires a statistically sound PRNG.

The problem is that List::Util::shuffle always uses the platform specific rand, and my efforts to pursuade it to use M::R::MT have (so far) failed. So, then next best thing was a pure perl shuffle, which is what I've been using, but as the performance of the shuffle directly affects the performance of my algorithm (and therefore the time taken to run tests), I used an in-place shuffle to mitigate some of the slow down from not using L::U's XS version. When I posted updated code for the OP, that is what lead to the discovery of the empty array bug. I c&pd the code directly from my testcase, which used the in-place PP shuffle, which meant that the L::U shuffle was doing nothing on the OPs machine.

But, the combination of M::R::MT and a pure perl shuffle means that when I eventually get around to doing a some broad band random dataset testing, I should be able to prove that the algorithm will always produce an optimal result in X% of cases, and near optimal results in Y% of cases etc. on the basis of statistical analysis. Which is what I set out to convince myself of.

Unfortunately, my inability to locate (or code) a simple, memory efficient, with duplicates, array difference algorithm correctly, provided for another unnecessary distraction. With those two problems now addressed, I can get on with proving (to myself in the first place) that the algorithm works across a wide range of inputs using a combination of code that mean that my results will be reproducable by others.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^19: Divide array of integers into most similar value halves by BrowserUk
in thread Divide array of integers into most similar value halves by Pepe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.