in reply to Divide array of integers into most similar value halves

Are your lists big? And do you really need the best solution, or is a good approximation sufficient?

I think that Dominus' book Higher Order Perl contains an example for something similar, and if I remember correctly it just went through all possible solutions, which doesn't scale very well for long lists.

Update: Here's a very simple approximative solution (after finding out that it might be good enough):

#!/usr/bin/perl use strict; use warnings; use List::Util qw(sum); divide(8,14,32,29); divide(7,10,12,15,40); sub divide { my @array = reverse sort { $a <=> $b} @_; my $target = sum(@array) / 2; my @result; my $current = 0; for (@array) { if ($current + $_ <= $target) { push @result, $_; $current += $_; } } print "@result\n"; print "Target: $target; Result: $current\n"; } __END__ 32 Target: 41.5; Result: 40 7 15 12 Target: 42; Result: 34

If that's not good enough you can iterate over all pairs of values from distinct sets and see if the reached value improves. If it does, swap these two elements.

(second update: fixed sort. FunkyMonk++)

Replies are listed 'Best First'.
Re^2: Divide array of integers into most similar value halves
by Pepe (Sexton) on Sep 01, 2008 at 20:56 UTC
    Thanks a lot
    I believe it works just fine. Thanks again
Re^2: Divide array of integers into most similar value halves
by Pepe (Sexton) on Sep 01, 2008 at 19:47 UTC
    My lists are not really big. Almost never bigger than 100.
      For a problem with complexity O(2**$n) a value of $n == 100 is enough to make your computer work until either he or your dies. That's why I asked if an approximation is enough.

        So... 2**100 is ~ 10**30. If the computer can try one combination in (just for the sake of argument) ~3 micro-secs, it can try ~ 10**13 combinations per annum. So a result may be expected in 10**17 years. Perhaps we should spend more on the hardware ?

        Actually, it's not as bad as 2**n. 2**n - 1 is the number of different combinations of n things. However, we have two partitions, A & B, and there is some symmetry, which reduces the number of combinations.

        Consider: there are n possible partition A's containing 1 number -- but this is the same arrangement as the n possible partition A's containing n-1 numbers (but with A & B exchanged). So we don't count both of A with 1 number and A with n-1 numbers, and we don't count both A with 2 numbers and A with n-2 numbers, and so on.

        The effect of this is to halve the number of possible arrangements[1].

        Also: we can discount the 1 possible partition A with all the numbers in it.

        Phew. We only have wait half the time we expected. There, that's made a difference :-)

        Mind you, we're not taking into account the doubling of processor speed every 18 months or so. Which makes this sort of problem a kind of inverted Xeno's paradox. Let me see, if in 18 months I can buy a machine that will do twice as much work as the one I have: in 36 months time I can have done the same amount of work by waiting 18 months, and starting then with the faster machine. Also, that machine will be cheaper than one I can buy now ! Of course, in 36 months time I will be able to buy an even cheaper machine which is four times as fast. For maximum efficiency, it is clear: I should do nothing, indefinitely !

        [1] If n is odd, the number of distinct partitions is exactly 2**(n-1) - 1.
        If n is even it's a bit trickier to work out, you have to add (1/2) * (n!/(2*(n/2)!) to that.

        An approximation is more than enough. Actually i could skip the big lists and do it only with lists with less than 10 numbers, if that would help.
        The real motivation for the problem (sorry for not explaining before) is that I need both subarrays to have a sum over a not-so-big threshold (easily achieved with 2 or 3 numbers per array). So if the big array is really big I can assume that the two small ones will pass the threshold.

        Thanks for your help.
        I'm pretty sure the brute force method (find all divisions of all permutations) is O(N3), much better than O(2N)