Re: Divide array of integers into most similar value halves
by moritz (Cardinal) on Sep 01, 2008 at 19:35 UTC
|
Are your lists big? And do you really need the best solution, or is a good approximation sufficient?
I think that Dominus' book Higher Order Perl contains an example for something similar, and if I remember correctly it just went through all possible solutions, which doesn't scale very well for long lists.
Update: Here's a very simple approximative solution (after finding out that it might be good enough):
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(sum);
divide(8,14,32,29);
divide(7,10,12,15,40);
sub divide {
my @array = reverse sort { $a <=> $b} @_;
my $target = sum(@array) / 2;
my @result;
my $current = 0;
for (@array) {
if ($current + $_ <= $target) {
push @result, $_;
$current += $_;
}
}
print "@result\n";
print "Target: $target; Result: $current\n";
}
__END__
32
Target: 41.5; Result: 40
7 15 12
Target: 42; Result: 34
If that's not good enough you can iterate over all pairs of values from distinct sets and see if the reached value improves. If it does, swap these two elements.
(second update: fixed sort. FunkyMonk++) | [reply] [d/l] |
|
|
Thanks a lot
I believe it works just fine. Thanks again
| [reply] |
|
|
My lists are not really big. Almost never bigger than 100.
| [reply] |
|
|
For a problem with complexity O(2**$n) a value of $n == 100 is enough to make your computer work until either he or your dies. That's why I asked if an approximation is enough.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
Re: Divide array of integers into most similar value halves
by Skeeve (Parson) on Sep 01, 2008 at 19:48 UTC
|
You know that this problem is np-complete? It's well known as the partition problem.
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [d/l] [select] |
|
|
I read about it, but the Partition Problem just tells you if the list of numbers can be partitioned into 2 halves that have the same sum.
I'm not really interested into checking that. I don't mind them having the same or different sum. I only want the best possible partition.
Thanks anyway.
| [reply] |
|
|
Strange! The german wikipedia page states "Gesucht wird eine Aufteilung dieser Zahlen auf zwei Haufen, so dass die Differenz der Summen der Zahlen in den beiden Haufen möglichst klein ist." (Find a partition such that the diffference of the sum of each heap is minimal). And this is exactly what you're after.
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [d/l] [select] |
|
|
|
|
|
|
| [reply] |
|
|
If you had an algorithm that found the best possible partition and there is one with difference 0 (for some array of integers) then your algorithm would find that one, otherwise you would find a difference>0. So your algorithm would be a solution to the partition problem.
| [reply] |
|
|
But to find out if your solution is the best partition (and it's value is not sum/2) you have to check if there is a possible better partition, which means checking if the partition with the value sum/2 exists. Thus you have to solve the partition problem. D'oh.
| [reply] |
Re: Divide array of integers into most similar value halves
by FunkyMonk (Bishop) on Sep 01, 2008 at 19:46 UTC
|
How about...
- @numbers = reverse sort @numbers
- $target = sum(@numbers)/2
- loop forever {
- find largest $number in @numbers that is smaller than $target
- exit loop if not found
- add that number to @bucket
- remove that number from @numbers
- subtract that number from $target
- }
- return @bucket and @numbers
and in perl 5.10...
use List::Util qw(sum);
use List::MoreUtils qw(first_index);
#my @numbers = map { rand() * 100 } 1 .. 5;
my @numbers = (8,14,32,29);
my @b = split_evenly( \@numbers );
say "First container: sum(@{$b[0]}) = ", sum @{$b[0]};
say "Second container: sum(@{$b[1]}) = ", sum @{$b[1]};
sub split_evenly {
my @numbers = reverse sort { $a <=> $b } @{+shift};
my $target = sum(@numbers) / 2;
say "Target is $target";
my @b;
while ( 1 ) {
my $index = first_index { $_ <= $target } @numbers;
last if $index < 0;
$target -= $numbers[$index];
push @b, splice @numbers, $index, 1;
}
return \@b, \@numbers;
}
replace the says with print for perl < 5.10.
Tested, but not exhaustively. Use at your own risk. etc.
Update: changed the output so the lists are displayed.
| [reply] [d/l] |
|
|
Unfortunately, this algorithm can fail. I didn't turn up an example with small numbers after some thinking, so here's one that I stole from an article transitively linked by Skeeve: the set {62, 83, 121, 281, 486, 734, 771, 854, 885, 1003} has a perfect partition (namely, {62, 83, 121, 486, 885, 1003}), but the greedy algorithm that you suggest returns {1003, 885, 734}, which has a defect of 18 (not 32, I think, despite the article).
The algorithm I used to find that match is based on my imperfect recall of the one from Dominus's book mentioned by moritz. I'm sure it can be made much more elegant (for example, by not hard-wiring @list; and, less trivially, by not passing in $so_far_ref—I was just using it to print the diagnostics), but I think that this works:
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
Thanks for noting. I'm still trying it out the first algorithm in my data. I believe is close enough for what I want.
Of course I'm open to any improvements...
Thanks a lot.
| [reply] |
|
|
Hey FunkyMonk,
sorry to bother you again, but I've been extensively testing the script you posted and works most of the time but fails in cases as:
@array = (41,37,37,43)
returning
@array1=(43)
@array2=(41,37,37)
Any ideas?
Thanks in advance
Pepe
| [reply] |
|
|
Take a look at moritz's orginal reply and its followups. That's the difference between doing it correctly (their discussion) and an evil, dirty, quick hack (my version).
It all depends with how bad "good enough" can be :)
My program will never allow the first container to hold more than $target. All the other numbers go into the second. With (41,37,37,43) as input, 43 goes into the first container, and none of the other numbers will fit, so all the rest go into the second container.
Tweaking the comparison in first_index will allow this dataset to be divided more evenly, but will produce worse solutions in some cases. For example, making this change
first_index { $_ <= $target*1.1 } # allow container to overflow by 10%
produces (I changed the output format slightly):
Original numbers: (41 37 37 43)
Target is 79
First container: sum(43 37) = 80
Second container: sum(41 37) = 78
The code is fast, so there's nothing to stop you dividing your numbers twice using the untweaked and tweaked comparison and choosing the answer you like best.
But, all that does is make a dirty, evil, quick hack dirtier, more evil and slower. You'll still be able to find cases where it fails.
The only real solution is to do it properly.
| [reply] [d/l] [select] |
|
|
|
|
So far works for me,
I have to test it in 20.000 arrays, so I will write some code to prove that it works fine, but it seems to do pretty well.
Thanks a lot It's been great help!!!!
| [reply] |
Re: Divide array of integers into most similar value halves
by BrowserUk (Patriarch) on Sep 02, 2008 at 05:41 UTC
|
Note: This only works with positive integers!
Try this. It uses a semi-random approach with a specifiable limit to bound the attempts it makes. It seems to do a pretty good job of finding the optimium solution most of the time. It will occasionally miss, but when it does, it still delivers a close to optimium result:
#! perl -slw
use strict;
use List::Util qw[ sum shuffle ];
sub partition {
my( $limit, $aRef ) = @_;
my @in = sort{ $a <=> $b } @$aRef;
my $target = sum( @in ) >> 1;
my( $best, @best ) = 9e99;
my $soFar = 0;
my @half;
for( 1 .. $limit ) {
#print "$soFar : [@half] [@in] [@best]"; <>;
$soFar += $in[ 0 ], push @half, shift @in while $soFar < $targ
+et;
return( \@half, \@in ) if $soFar == $target;
my $diff = abs( $soFar - $target );
( $best, @best ) = ( $diff, @half ) if $diff < $best;
$soFar -= $half[ 0 ], push @in, shift @half while $soFar > $ta
+rget;
return( \@half, \@in ) if $soFar == $target;
$diff = abs( $soFar - $target );
( $best, @best ) = ( $diff, @half ) if $diff < $best;
@in = shuffle @in;
}
my %seen; $seen{ $_ }++ for @best;
## return \@best, [ grep !$seen{ $_ }--, @$aRef ]; ## Fix duplicate
+s bug
return \@best, [ grep{ !exists $seen{ $_ } or !$seen{ $_ }-- } @$a
+Ref ];
}
for (
[ 62, 83, 121, 281, 486, 734, 771, 854, 885, 1003 ],
[ 7,10,12,15,40 ],
[ 8, 14, 32, 29 ],
[ 41, 37, 37, 43 ],
[ 99, (1)x99 ],
[ 1 .. 99 ],
[ map $_*2+1, 1 .. 50 ],
[ map int( rand 1000 ), 1 .. 100 ],
) {
my( $a1, $a2 ) = partition( 1e2, $_ );
my( $t1, $t2 ) = ( sum( @$a1), sum( @$a2 ) );
print "\n@$a1 := ", $t1;
print "@$a2 := ", $t2;
print "Diff: ", abs( $t1 - $t2 );
}
The limit is specified as an input parameter. Above I'm using a hardcoded limit of 100, but that might be better specified in terms of the number of input values. Say, @input * 10 might be a starting point. The longer you're prepared to wait, the better solution it will generally find. It will return quickly if a perfect solution is found.
Testing is limited to what you see above. Anyone know of any particularly hard cases?
c:\test>708290.pl
734 771 854 281 := 2640
121 62 83 885 486 1003 := 2640
Diff: 0
7 10 12 15 := 44
40 := 40
Diff: 4
8 32 := 40
14 29 := 43
Diff: 3
37 41 := 78
37 43 := 80
Diff: 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 := 99
99 := 99
Diff: 0
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
+30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
+53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 := 2475
94 1 72 99 2 81 96 86 79 74 91 85 92 89 76 71 98 88 93 97 77 84 78 90
+73 75 95 3 4 87 80 83 82 := 2475
Diff: 0
63 65 67 69 71 73 101 79 99 11 75 91 81 87 95 5 37 13 41 77 := 1300
27 29 3 33 21 53 85 61 23 17 47 49 45 51 35 89 7 55 43 57 59 15 25 31
+39 9 93 83 97 19 := 1300
Diff: 0
248 28 269 882 847 389 242 192 484 519 123 259 528 363 410 722 110 34
+501 922 695 622 831 48 464 234 548 815 702 725 403 629 29 579 365 81
+294 47 927 879 639 470 902 318 313 736 228 230 750 38 620 77 702 426
+582 := 25020
923 457 738 7 847 234 696 334 200 989 231 744 528 402 619 485 791 272
+261 533 525 298 724 615 253 71 957 836 904 105 575 929 537 802 791 94
+1 437 538 740 165 229 528 867 609 753 := 25020
Diff: 0
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
Sorry to disappoint you BrowserUk,
your sub does not work if a number is repeated in the array.
Try @array = (33,33,37);
The solution is good in every other case, though. Only need that to be fixed.
Thanks a lot for your effort, anyways.
Pepe
| [reply] |
|
|
return \@best, [ grep{ !exists $seen{ $_ } or !$seen{ $_ }-- } @$a
+Ref ];
Basically, because the sub only stores one best partition, when it runs out of iterations, it needs to filter the input array to generate teh other half. The filter was a sloppy variation on a theme that works for other purposes but not this. I think the above corrects it, but I will need to give more thought once my brain has awoken.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
|
|
|
|
|
|
|
|
Re: Divide array of integers into most similar value halves (good enough)
by tye (Sage) on Sep 02, 2008 at 05:42 UTC
|
Yay for fancy comp sci terms. It is NP-complete! (or is it "NP hard"?)
Given 100 numbers, this code finds a likely-optimal solution in about 1 second. If you are unlucky, it can spend a very long time after that not finding any better solutions. :)
#!/usr/bin/perl -w
use strict;
sub halfWeights
{
my @weights= sort { $b <=> $a } @_;
my $dist= 0;
$dist += $_ for @weights;
$dist /= 2;
my $best= $dist;
my @sol;
my @idx= ( 0 );
while( 1 ) {
$dist -= $weights[$idx[-1]];
for( abs($dist) ) {
if( $_ < $best ) {
$best= $_;
@sol= @idx;
printf STDERR "%+g: %s\n", $_, join( ", ", @weights[@i
+dx] );
return @weights[ @sol ]
if( 0 == $_ );
}
}
if( 0 < $dist ) {
push @idx, 1 + $idx[-1]
} else {
$dist += $weights[ $idx[-1]++ ];
}
while( @weights <= $idx[-1] ) {
pop @idx;
return @weights[ @sol ]
if( 1 == @idx );
$dist += $weights[ $idx[-1]++ ];
}
}
}
@ARGV= ( 100 )
if( ! @ARGV );
push @ARGV, $ARGV[0]*$ARGV[0]
if( 1 == @ARGV );
if( 2 == @ARGV ) {
my $cnt= shift @ARGV;
my $max= shift @ARGV;
push @ARGV, 1 + int rand($max)
while( @ARGV < $cnt );
} elsif( 3 == @ARGV ) {
my $cnt= shift @ARGV;
while( @ARGV < $cnt ) {
push @ARGV, $ARGV[-2] + $ARGV[-1];
}
}
halfWeights( @ARGV );
Note that if you have fractional weights, then the way that things are computed will likely cause a growing accumulation of errors that won't impact the likely-optimal solution much but could cause serious misrepresentation of how close other solutions really are if you wait the hundreds of years and more that it could spend contemplating them.
| [reply] [d/l] |
Re: Divide array of integers into most similar value halves
by swampyankee (Parson) on Sep 01, 2008 at 21:08 UTC
|
| [reply] |
Re: Divide array of integers into most similar value halves
by psini (Deacon) on Sep 01, 2008 at 20:22 UTC
|
If you want the best solution, 100 values is not a small list for, at first approximation you need about (n/2)! tries to check all the possible partitions.
You could use a couple of tricks to reduce the range of solutions:
- Say N the sum of all the elements, divided by two, and floored; say S(X) the sum of the elements of a given subset X of your list. Your problem can be reduced to find the subset P with the minimum ABS(S(P)-N) within all the possible subsets. The second set Q is obviously given by the difference between the original list and P.
- Say M the greatest element of the list. You can safely assume that it is part of the solution (all elements are) so you can take it off from the list, decrease N by M and apply the previous point to the shortened list and the reduced N. If your list has a great dispersion, this can lead to a significant reduction of the number of tests required.
Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."
| [reply] |
Re: Divide array of integers into most similar value halves
by GrandFather (Saint) on Sep 01, 2008 at 22:17 UTC
|
What is the bigger picture? Generally a good enough solution is a strong function of what you need the solution for. In this case a generally solution may be impractical, but a good enough solution can only be determined in the context of the problem context.
Perl reduces RSI - it saves typing
| [reply] |
Re: Divide array of integers into most similar value halves
by johndageek (Hermit) on Sep 02, 2008 at 14:38 UTC
|
#!/usr/bin/perl
## arrays to test
#@aoi = (1,33,2,5,6,2,9999,1,555,333,654,8,1,234,0,765,2,3,446,753);
#@aoi = (1,33,2,5,6,2,999,1,555,333,654,8,1,234,0,765,2,3,446,753);
#@aoi = (1,1,33,2,5,6,2,999,1,555,333,654,8,1,234,0,765,2,3,446,753);
#@aoi = (1,1,33,2,5,6,2,999,8,1,555,333,654,8,1,234,0,765,2,3,446,753)
+;
@aoi = (2406,1,1,33,2,5,6,2,999,8,1,555,333,654,8,1,234,0,765,2,3,446,
+753);
# working variables
@arr1 = ();
$sum1 = 0;
@arr2=();
$sum2 = 0;
# sort list
@saoi = sort { $a <=> $b} @aoi;
## start with highest value working downwards, pushing onto array cont
+aining
## the lowest sum. SHould give you the least available difference betw
+een array sums
for ($t=$#saoi;$t>-1;$t--){
if ($sum2 > $sum1){
$sum1 = $sum1 + $saoi[$t];
push @arr1,$saoi[$t];
}else{
$sum2 = $sum2 + $saoi[$t];
push @arr2,$saoi[$t];
}
}
$diff = $sum2 - $sum1;
print "$sum2 - $sum1 = $diff\n";
| [reply] [d/l] |
|
|
Dageek,
this is a great idea!!! I'm gonna try it. By adding the next value to the smallest group the result should be close to optimum.
Also simple and fast
Thanks a lot
Pepe
| [reply] |
|
|
It'll fail for e.g. (3,3,2,2,2) and other sets where the top two numbers are odd, the rest even, and the ideal split is even. (and other cases too, e.g. (10,10,4,4,4,4,4))
| [reply] |
|
|
|
|
|
|
|
|
Re: Divide array of integers into most similar value halves
by tilly (Archbishop) on Sep 02, 2008 at 05:58 UTC
|
Many comments to the contrary notwithstanding, the odds are very good that your actual problem is not really NP-complete. My discussion of why not is kind of long, though, so I posted it as a meditation at NP-complete sometimes isn't. | [reply] |
Re: Divide array of integers into most similar value halves
by praveeperl (Initiate) on Sep 02, 2008 at 10:59 UTC
|
my @array = (8,14,32,29);
@new = sort{$a <=> $b}@array;#sorting
$i = 'first';#set flag
foreach (@new){
if ($i eq 'first'){
push (@subarray1, $_);
$i = 'second';
}else{
push (@subarray2, $_);
$i = 'first'
}
}
output:
@subarray1 = (8,32) # total value 40
@subarray2 = (14,29) # total value 43
| [reply] [d/l] |
|
|
| [reply] |
Re: Divide array of integers into most similar value halves
by bduggan (Pilgrim) on Sep 02, 2008 at 19:01 UTC
|
This sounds like a good dynamic programming exercise :
use List::Util qw/sum/;
use Memoize; memoize('sum_to');
use strict;
sub sum_to {
# given : $n, @ary
# return : two array refs, the first one summing to $n
# the second one has the remaining elements
# return nothing if it's impossible
my $n = shift;
my @ary = @_;
return [[],\@ary] if $n==0;
return if ($n<0 || @ary==0);
for my $elem (@ary) {
my %seen;
my @left = grep { $_ != $elem || $seen{$_}++ } @ary;
if (my $found = sum_to($n-$elem,@left)) {
return [ [ $elem, @{ $found->[0] } ],
[ @{ $found->[1] } ]
];
}
}
return;
}
my @nums = map int rand 1000, 1..100;
my $sum = sum @nums;
my $target = int ( $sum / 2);
while ($target > 0) {
my $found = sum_to($target,@nums) or next;
my $first = $target ." == ". join '+', @{ $found->[0] };
my $second = ($sum-$target)." == ". join '+', @{ $found->[1] };
print join "\n",$first, $second;
die "there was a problem" unless eval $first && eval $second;
last;
} continue {
$target--;
}
I think this might fail in the case where there are two ways to sum to a given number, and only one of those ways is right -- to account for this, sum_to shd probably return an array of solutions.
| [reply] [d/l] |
Re: Divide array of integers into most similar value halves
by FunkyMonk (Bishop) on Sep 02, 2008 at 21:44 UTC
|
Just out of interest, why do you want it? What's the real application? | [reply] |
|
|
I'm not sure how to explain this... I have DNA sequence alignments, something like sequences of letter piled up so the similar bases (letters) are in top of each other. Then, each sequence has assigned a quality scores.
Something like:
A(45) C(44) T(44) A(45)
A(31) T(31) T(35) A(37)
A(50) C(52) A(52) A(52)
I'm looking for variation in those sequences, but in order to find variation that is reliable I've calculated that I need to find groups of sequences with the same base that add up to at least quality 50.
So in the previous example the second column C(44+52=96)/T(31) does not have enough quality to be considered (only one of the bases reaches the required quality), but the third column T(44+35=79)/A(52) does.
With your script I was trying to estimate how many of those positions in the alignment can I even consider analyzing, ie. how many of those positions (array of quality scores) can be separated in two subarrays that pass the threshold.
Sorry if it's not clear.
Pepe
| [reply] |
|
|
| [reply] |
|
|