in reply to Re^18: Divide array of integers into most similar value halves
in thread Divide array of integers into most similar value halves
Worst case memory use and running time should both scale with my algorithm like O(N*N*M).
Reverting to hashes instead of arrays (in order to accomodate non-integer values) has really hurt your algorithm. I upped the timeout from 1 to 5 minutes, but I still see many timeouts. And memory consumption often exceeds my 1.5GB causing perl to segfault on the 1e5 and 1e6 tests.
About a reproducible random shuffle, why not just call srand with some specific value
See my thread at Difference arrays.. My native rand is an abysmal 15-bit linear congruent PRNG, which besides being awful for any kind of statistical analysis, is different from your 48-bit generator. So while srand makes it reproducable on my system, it won't produce the same results on your system (nor the OPs). For that reason, I always use Math::Random::MT (which I checked *does* produce the same output for any given seeding across systems), for anything that requires a statistically sound PRNG.
The problem is that List::Util::shuffle always uses the platform specific rand, and my efforts to pursuade it to use M::R::MT have (so far) failed. So, then next best thing was a pure perl shuffle, which is what I've been using, but as the performance of the shuffle directly affects the performance of my algorithm (and therefore the time taken to run tests), I used an in-place shuffle to mitigate some of the slow down from not using L::U's XS version. When I posted updated code for the OP, that is what lead to the discovery of the empty array bug. I c&pd the code directly from my testcase, which used the in-place PP shuffle, which meant that the L::U shuffle was doing nothing on the OPs machine.
But, the combination of M::R::MT and a pure perl shuffle means that when I eventually get around to doing a some broad band random dataset testing, I should be able to prove that the algorithm will always produce an optimal result in X% of cases, and near optimal results in Y% of cases etc. on the basis of statistical analysis. Which is what I set out to convince myself of.
Unfortunately, my inability to locate (or code) a simple, memory efficient, with duplicates, array difference algorithm correctly, provided for another unnecessary distraction. With those two problems now addressed, I can get on with proving (to myself in the first place) that the algorithm works across a wide range of inputs using a combination of code that mean that my results will be reproducable by others.
|
|---|