in reply to Re^15: Divide array of integers into most similar value halves (Updated)
in thread Divide array of integers into most similar value halves
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^17: Divide array of integers into most similar value halves
by BrowserUk (Patriarch) on Sep 05, 2008 at 00:17 UTC | |
Great! I'm glad you found a solution to your liking. However, you have a decision to make. It's a trade off between consistently optimal results and consistently predictable timings. tilly's exhaustive search suffers from similar problems to my own attempt at such. Whilst it will eventually find an optimum solution, sometimes quite quickly, quite frequently it will spend an inordinate amount of time finding it. And an inordinate amount of memory too! Below, you will see a comparison of tilly's (latest) and my (latest) code run against the same random data sets with 1e1 thru 1e6 elements. Notice how with 1e3 and 1e5 elements, tilly's code was timed out after a full five minutes where the semi random approach never takes more than ~10 seconds. For the 1e3 case, tilly's code uses over 1GB of ram. For the 1e5 case, it came perilously close to exhausting my 1.5GB of physical memory.
The memory consumption may not be a problem if your data sets are always smallish, but occasionally even with a relatively small dataset of 100 elements it will take longer that 60 seconds to find a solution, and yet only take 18 seconds for 1 million!
I appreciate your sentiments elsewhere that tilly's code is reliable--that's why I chose it to verify my own against:)--but being exhaustive and deterministic makes it easier to test and debug. The non-determinism of a random (genetic) algorithm makes it much harder to test. That's why I've spent the last two days attempting to get a reproducible random shuffle. But, for NP-hard problems, Genetic Algorithms have one huge advantage. You can specify how long you are prepared to ever wait (in terms of either iterations or best solution within a time limit) for a solution. That's much harder to arrange with an exhaustive solution. There is another possibility. You can combine the two approaches. You start off using the exhaustive approach and either through a time limit, or (and you'd have to consult tilly for details), through some heuristic guesstimate of how long it is likely to take for the given data set, switch to using the genetic algorithm. Perhaps passing the best found so far from the exhaustive attempt into the genetic to see if it can improve it within some specified number of iterations. In any case, whichever approach meets your needs, thank you for presenting an extremely interesting (if at times, frustrating:) problem. That's only the fourth or fifth GA I've ever attempted, and they are frustratingly hard to verify. I learnt one big thing from your feedback. Using small datasets (3 or 4 elements) is a far better testing strategy that large arrays of random values! Three nested loops cycling through all the possible permutations of 3 x 0 .. 99, would be trivial and fast to verify through brute force, but (as your feedback showed), stands to highlight bugs very quickly. Obvious now I seen it in action. But hey! We're never too old to learn something new. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |