Re^16: Divide array of integers into most similar value halves

Replies are listed 'Best First'.
Re^17: Divide array of integers into most similar value halves by BrowserUk (Patriarch) on Sep 05, 2008 at 00:17 UTC
Great! I'm glad you found a solution to your liking. However, you have a decision to make. It's a trade off between consistently optimal results and consistently predictable timings. tilly's exhaustive search suffers from similar problems to my own attempt at such. Whilst it will eventually find an optimum solution, sometimes quite quickly, quite frequently it will spend an inordinate amount of time finding it. And an inordinate amount of memory too! Below, you will see a comparison of tilly's (latest) and my (latest) code run against the same random data sets with 1e1 thru 1e6 elements. Notice how with 1e3 and 1e5 elements, tilly's code was timed out after a full five minutes where the semi random approach never takes more than ~10 seconds. For the 1e3 case, tilly's code uses over 1GB of ram. For the 1e5 case, it came perilously close to exhausting my 1.5GB of physical memory. c:\test>708290-b -T=300 Testing buk with 10 random values in the range 0 .. 10000 Differ +ence:= 26; took 0.002259 seconds Testing tilly with 10 random value sin the range 0 .. 10000 Differ +ence:= 2; took 0.000975 seconds Testing buk with 100 random values in the range 0 .. 10000 Differ +ence:= 1; took 0.039580 seconds Testing tilly with 100 random values in the range 0 .. 10000 Differ +ence:= 1; took 8.175000 seconds Testing buk with 1000 random values in the range 0 .. 10000 Differ +ence:= 1; took 0.070234 seconds Testing tilly with 1000 random values in the range 0 .. 10000 + ***** timed out after 300 seconds Testing buk with 10000 random values in the range 0 .. 10000 Differ +ence:= 0; took 0.203125 seconds Testing tilly with 10000 random values in the range 0 .. 10000 Differ +ence:= 0; took 0.161583 seconds Testing buk with 100000 random values in the range 0 .. 10000 Diffe +rence:= 1; took 3.093750 seconds Testing tilly with 100000 random values in the range 0 .. 10000 + *** timed out after 300 seconds Testing buk with 1000000 random values in the range 0 .. 10000 Diff +erence:= 0; took 10.578125 seconds Testing tilly with 1000000 random values in the range 0 .. 10000 Diff +erence:= 0; took 21.593750 seconds [download] The memory consumption may not be a problem if your data sets are always smallish, but occasionally even with a relatively small dataset of 100 elements it will take longer that 60 seconds to find a solution, and yet only take 18 seconds for 1 million! c:\test>708290-b -MAX=1e1 Testing buk with 10 random valuesin the range 0 .. 1e1 Differenc +e:= 8; took 0.002118 seconds Testing tilly with 10 random valuesin the range 0 .. 1e1 Differenc +e:= 8; took 0.001475 seconds Testing buk with 100 random valuesin the range 0 .. 1e1 Differenc +e:= 1; took 0.039413 seconds Testing tilly with 100 random valuesin the range 0 .. 1e1 Differenc +e:= 1; took 7.955964 seconds Testing buk with 1000 random valuesin the range 0 .. 1e1 Differenc +e:= 1; took 0.088507 seconds Testing tilly with 1000 random valuesin the range 0 .. 1e1 Terminati +ng on signal SIGINT(2) c:\test>708290-b -MAX=1e1 Testing buk with 10 random valuesin the range 0 .. 1e1 Differenc +e:= 1; took 0.000477 seconds Testing tilly with 10 random valuesin the range 0 .. 1e1 Differenc +e:= 1; took 0.001215 seconds Testing buk with 100 random valuesin the range 0 .. 1e1 Differenc +e:= 1; took 0.046773 seconds Testing tilly with 100 random valuesin the range 0 .. 1e1 c:\test>708290-b -MAX=1e3 Testing buk with 10 random valuesin the range 0 .. 1e3 Differenc +e:= 7; took 0.001884 seconds Testing tilly with 10 random valuesin the range 0 .. 1e3 Differenc +e:= 7; took 0.001118 seconds Testing buk with 100 random valuesin the range 0 .. 1e3 Differenc +e:= 1; took 0.039995 seconds Testing tilly with 100 random valuesin the range 0 .. 1e3 + *** timed out after 60 seconds Testing buk with 1000 random valuesin the range 0 .. 1e3 Differenc +e:= 1; took 0.078125 seconds Testing tilly with 1000 random valuesin the range 0 .. 1e3 + *** timed out after 60 seconds Testing buk with 10000 random valuesin the range 0 .. 1e3 Differenc +e:= 0; took 0.203125 seconds Testing tilly with 10000 random valuesin the range 0 .. 1e3 Differenc +e:= 0; took 0.183086 seconds Testing buk with 100000 random valuesin the range 0 .. 1e3 Differen +ce:= 1; took 2.879401 seconds Testing tilly with 100000 random valuesin the range 0 .. 1e3 + ***** timed out after 60 seconds Testing buk with 1000000 random valuesin the range 0 .. 1e3 Differe +nce:= 0; took 8.453125 seconds Testing tilly with 1000000 random valuesin the range 0 .. 1e3 Differe +nce:= 0; took 17.265625 seconds [download] I appreciate your sentiments elsewhere that tilly's code is reliable--that's why I chose it to verify my own against:)--but being exhaustive and deterministic makes it easier to test and debug. The non-determinism of a random (genetic) algorithm makes it much harder to test. That's why I've spent the last two days attempting to get a reproducible random shuffle. But, for NP-hard problems, Genetic Algorithms have one huge advantage. You can specify how long you are prepared to ever wait (in terms of either iterations or best solution within a time limit) for a solution. That's much harder to arrange with an exhaustive solution. There is another possibility. You can combine the two approaches. You start off using the exhaustive approach and either through a time limit, or (and you'd have to consult tilly for details), through some heuristic guesstimate of how long it is likely to take for the given data set, switch to using the genetic algorithm. Perhaps passing the best found so far from the exhaustive attempt into the genetic to see if it can improve it within some specified number of iterations. In any case, whichever approach meets your needs, thank you for presenting an extremely interesting (if at times, frustrating:) problem. That's only the fourth or fifth GA I've ever attempted, and they are frustratingly hard to verify. I learnt one big thing from your feedback. Using small datasets (3 or 4 elements) is a far better testing strategy that large arrays of random values! Three nested loops cycling through all the possible permutations of 3 x 0 .. 99, would be trivial and fast to verify through brute force, but (as your feedback showed), stands to highlight bugs very quickly. Obvious now I seen it in action. But hey! We're never too old to learn something new. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^17: Divide array of integers into most similar value halves
by BrowserUk (Patriarch) on Sep 05, 2008 at 00:17 UTC

Great! I'm glad you found a solution to your liking.

However, you have a decision to make. It's a trade off between consistently optimal results and consistently predictable timings. tilly's exhaustive search suffers from similar problems to my own attempt at such. Whilst it will eventually find an optimum solution, sometimes quite quickly, quite frequently it will spend an inordinate amount of time finding it. And an inordinate amount of memory too!

Below, you will see a comparison of tilly's (latest) and my (latest) code run against the same random data sets with 1e1 thru 1e6 elements. Notice how with 1e3 and 1e5 elements, tilly's code was timed out after a full five minutes where the semi random approach never takes more than ~10 seconds. For the 1e3 case, tilly's code uses over 1GB of ram. For the 1e5 case, it came perilously close to exhausting my 1.5GB of physical memory.

c:\test>708290-b -T=300
Testing buk    with    10 random values in the range 0 .. 10000 Differ
+ence:=    26; took 0.002259 seconds
Testing tilly  with    10 random value sin the range 0 .. 10000 Differ
+ence:=     2; took 0.000975 seconds

Testing buk    with   100 random values in the range 0 .. 10000 Differ
+ence:=     1; took 0.039580 seconds
Testing tilly  with   100 random values in the range 0 .. 10000 Differ
+ence:=     1; took 8.175000 seconds

Testing buk    with  1000 random values in the range 0 .. 10000 Differ
+ence:=     1; took 0.070234 seconds
Testing tilly  with  1000 random values in the range 0 .. 10000       
+   ******* timed out after 300 seconds

Testing buk    with 10000 random values in the range 0 .. 10000 Differ
+ence:=     0; took 0.203125 seconds
Testing tilly  with 10000 random values in the range 0 .. 10000 Differ
+ence:=     0; took 0.161583 seconds

Testing buk    with 100000 random values in the range 0 .. 10000 Diffe
+rence:=     1; took 3.093750 seconds
Testing tilly  with 100000 random values in the range 0 .. 10000      
+           ******* timed out after 300 seconds

Testing buk    with 1000000 random values in the range 0 .. 10000 Diff
+erence:=     0; took 10.578125 seconds
Testing tilly  with 1000000 random values in the range 0 .. 10000 Diff
+erence:=     0; took 21.593750 seconds
[download]

The memory consumption may not be a problem if your data sets are always smallish, but occasionally even with a relatively small dataset of 100 elements it will take longer that 60 seconds to find a solution, and yet only take 18 seconds for 1 million!

c:\test>708290-b -MAX=1e1
Testing buk    with    10 random valuesin the range 0 .. 1e1 Differenc
+e:=     8; took 0.002118 seconds
Testing tilly  with    10 random valuesin the range 0 .. 1e1 Differenc
+e:=     8; took 0.001475 seconds

Testing buk    with   100 random valuesin the range 0 .. 1e1 Differenc
+e:=     1; took 0.039413 seconds
Testing tilly  with   100 random valuesin the range 0 .. 1e1 Differenc
+e:=     1; took 7.955964 seconds

Testing buk    with  1000 random valuesin the range 0 .. 1e1 Differenc
+e:=     1; took 0.088507 seconds
Testing tilly  with  1000 random valuesin the range 0 .. 1e1 Terminati
+ng on signal SIGINT(2)

c:\test>708290-b -MAX=1e1
Testing buk    with    10 random valuesin the range 0 .. 1e1 Differenc
+e:=     1; took 0.000477 seconds
Testing tilly  with    10 random valuesin the range 0 .. 1e1 Differenc
+e:=     1; took 0.001215 seconds

Testing buk    with   100 random valuesin the range 0 .. 1e1 Differenc
+e:=     1; took 0.046773 seconds
Testing tilly  with   100 random valuesin the range 0 .. 1e1
c:\test>708290-b -MAX=1e3
Testing buk    with    10 random valuesin the range 0 .. 1e3 Differenc
+e:=     7; took 0.001884 seconds
Testing tilly  with    10 random valuesin the range 0 .. 1e3 Differenc
+e:=     7; took 0.001118 seconds

Testing buk    with   100 random valuesin the range 0 .. 1e3 Differenc
+e:=     1; took 0.039995 seconds
Testing tilly  with   100 random valuesin the range 0 .. 1e3          
+  ******* timed out after 60 seconds

Testing buk    with  1000 random valuesin the range 0 .. 1e3 Differenc
+e:=     1; took 0.078125 seconds
Testing tilly  with  1000 random valuesin the range 0 .. 1e3          
+  ******* timed out after 60 seconds

Testing buk    with 10000 random valuesin the range 0 .. 1e3 Differenc
+e:=     0; took 0.203125 seconds
Testing tilly  with 10000 random valuesin the range 0 .. 1e3 Differenc
+e:=     0; took 0.183086 seconds

Testing buk    with 100000 random valuesin the range 0 .. 1e3 Differen
+ce:=     1; took 2.879401 seconds
Testing tilly  with 100000 random valuesin the range 0 .. 1e3         
+  ******* timed out after 60 seconds

Testing buk    with 1000000 random valuesin the range 0 .. 1e3 Differe
+nce:=     0; took 8.453125 seconds
Testing tilly  with 1000000 random valuesin the range 0 .. 1e3 Differe
+nce:=     0; took 17.265625 seconds
[download]

I appreciate your sentiments elsewhere that tilly's code is reliable--that's why I chose it to verify my own against:)--but being exhaustive and deterministic makes it easier to test and debug. The non-determinism of a random (genetic) algorithm makes it much harder to test. That's why I've spent the last two days attempting to get a reproducible random shuffle.

But, for NP-hard problems, Genetic Algorithms have one huge advantage. You can specify how long you are prepared to ever wait (in terms of either iterations or best solution within a time limit) for a solution. That's much harder to arrange with an exhaustive solution.

There is another possibility. You can combine the two approaches. You start off using the exhaustive approach and either through a time limit, or (and you'd have to consult tilly for details), through some heuristic guesstimate of how long it is likely to take for the given data set, switch to using the genetic algorithm. Perhaps passing the best found so far from the exhaustive attempt into the genetic to see if it can improve it within some specified number of iterations.

In any case, whichever approach meets your needs, thank you for presenting an extremely interesting (if at times, frustrating:) problem.

That's only the fourth or fifth GA I've ever attempted, and they are frustratingly hard to verify. I learnt one big thing from your feedback. Using small datasets (3 or 4 elements) is a far better testing strategy that large arrays of random values! Three nested loops cycling through all the possible permutations of 3 x 0 .. 99, would be trivial and fast to verify through brute force, but (as your feedback showed), stands to highlight bugs very quickly.

Obvious now I seen it in action. But hey! We're never too old to learn something new.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]
[select]