Re: NP-complete sometimes isn't (A benchmark)

Here are some typical results from benchmarking your routine along with some of the routines in the other thread. (I omitted ikegami's as he wasn't happy with it):

c:\test>708290-b -LOG=4 -MAX=1e3
Testing buk    with    10 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.000244 seconds
Testing funky  with    10 random values in the range 0 .. 1e3 Differen
+ce:=    65; took 0.000118 seconds
Testing tilly  with    10 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.004491 seconds
Testing tye    with    10 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.001024 seconds

Testing buk    with   100 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.005047 seconds
Testing funky  with   100 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.000587 seconds
Testing tilly  with   100 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 15.174596 seconds
Testing tye    with   100 random values in the range 0 .. 1e3         
+  ******* timed out after 60 seconds

Testing buk    with  1000 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.015625 seconds
Testing funky  with  1000 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.007535 seconds
Testing tilly  with  1000 random values in the range 0 .. 1e3         
+  ******* timed out after 60 seconds
Testing tye    with  1000 random values in the range 0 .. 1e3         
+  ******* timed out after 60 seconds

Testing buk    with 10000 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.075423 seconds
Testing funky  with 10000 random values in the range 0 .. 1e3 Differen
+ce:=     1; took 0.190954 seconds
Testing tilly  with 10000 random values in the range 0 .. 1e3         
+  ******* timed out after 60 seconds
Testing tye    with 10000 random values in the range 0 .. 1e3         
+  ******* timed out after 60 seconds

c:\test>708290-b -LOG=4 -MAX=1e4
Testing buk    with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.000326 seconds
Testing funky  with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.000102 seconds
Testing tilly  with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.003578 seconds
Testing tye    with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.001063 seconds

Testing buk    with   100 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.044324 seconds
Testing funky  with   100 random valuesin the range 0 .. 1e4 Differenc
+e:=     7; took 0.000601 seconds
Testing tilly  with   100 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 15.901730 seconds
Testing tye    with   100 random valuesin the range 0 .. 1e4          
+  ******* timed out after 60 seconds

Testing buk    with  1000 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 0.515625 seconds
Testing funky  with  1000 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 0.008393 seconds
Testing tilly  with  1000 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 0.013012 seconds
Testing tye    with  1000 random valuesin the range 0 .. 1e4 Differenc
+e:= 92467; took 0.016869 seconds

Testing buk    with 10000 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 4.654061 seconds
Testing funky  with 10000 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 0.136925 seconds
Testing tilly  with 10000 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 0.148673 seconds
Testing tye    with 10000 random valuesin the range 0 .. 1e4 Differenc
+e:= 2252553; took 1.632284 seconds

c:\test>708290-b -LOG=4 -MAX=1e4
Testing buk    with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=     2; took 0.003548 seconds
Testing funky  with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=   198; took 0.000107 seconds
Testing tilly  with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=     2; took 0.003799 seconds
Testing tye    with    10 random valuesin the range 0 .. 1e4 Differenc
+e:=     2; took 0.001117 seconds

Testing buk    with   100 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 0.213406 seconds
Testing funky  with   100 random valuesin the range 0 .. 1e4 Differenc
+e:=     2; took 0.000587 seconds
Testing tilly  with   100 random valuesin the range 0 .. 1e4 Differenc
+e:=     0; took 0.003341 seconds
Testing tye    with   100 random valuesin the range 0 .. 1e4 Differenc
+e:=  1282; took 0.001121 seconds

Testing buk    with  1000 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.007796 seconds
Testing funky  with  1000 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.007606 seconds
Testing tilly  with  1000 random valuesin the range 0 .. 1e4          
+  ******* timed out after 60 seconds
Testing tye    with  1000 random valuesin the range 0 .. 1e4          
+  ******* timed out after 60 seconds

Testing buk    with 10000 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 4.281250 seconds
Testing funky  with 10000 random valuesin the range 0 .. 1e4 Differenc
+e:=     1; took 0.150492 seconds
Testing tilly  with 10000 random valuesin the range 0 .. 1e4          
+  ******* timed out after 60 seconds
Testing tye    with 10000 random valuesin the range 0 .. 1e4          
+  ******* timed out after 60 seconds
[download]

The full benchmark code is here:

#! perl -slw
use strict;
use List::Util qw[ shuffle sum ];
use List::MoreUtils qw(first_index);
use Time::HiRes qw[ time ];

our $MAX    ||= 1e4;    ## Maximum random values
our $V      ||= 0;      ## Causes the partitions to be printed
our $LOG    ||= 4;      ## No of logarithmic steps;
                        ## 4 means 10, 100, 100, 10000

my %tests = (
    buk     => sub { return buk( 20 * @{ $_[0] }, $_[0] ); },
    funky   => sub { return FunkyMonk( $_[ 0 ] ); },
    tilly   => sub { return tilly( @{ $_[ 0 ] } ); },
    tye     => sub {
        my @part1 = tye( @{ $_[ 0 ] } );

        my %seen; $seen{ $_ }++ for @part1;
        my @part2 = grep !$seen{ $_ }--, @{ $_[ 0 ] };

        return( \@part1, \@part2 );
     },
);

for my $n ( map 0+"1e$_", 1 .. $LOG ) {
    my @data = map int( rand 1e3 ), 1 .. $n;
    for my $test ( sort keys %tests ) {
        printf "Testing %-6s with %5d random values"
              . "in the range 0 .. $MAX ",
            $test, $n;
        my( $part1, $part2 );
        my( $start, $stop );
        eval {
            $SIG{ALRM} = sub{ die };
            alarm( 60 );
            $start = time();
            ( $part1, $part2 ) = $tests{ $test }->( \@data );
            $stop = time();
            alarm( 0 )
        };
        print "\t\t******* timed out after 60 seconds"
            and next if $@;
        my $t1 = sum @$part1;
        my $t2 = sum @$part2;

        print "\n[@$part1] := $t1" if $V;
        print "[@$part2] := $t2" if $V;
        printf "Difference:= %5d; took %f seconds\n",
            abs( $t1 - $t2 ), $stop - $start;
    }
    print '';
}

sub FunkyMonk {
    my @numbers = reverse sort { $a <=> $b } @{+shift};
    my $target = sum(@numbers) / 2;

    my @b;
    while ( 1 ) {
        my $index = first_index { $_ <= $target } @numbers;
        last if $index < 0;
        $target -= $numbers[$index];
        push @b, splice @numbers, $index, 1;
    }
    return \@b, \@numbers;
}

sub tye {
    my @weights= sort { $b <=> $a } @_;
    my $dist= 0;
    $dist += $_ for @weights;
    $dist /= 2;
    my $best= $dist;
    my @sol;
    my @idx= ( 0 );
    while( 1 ) {
        $dist -= $weights[$idx[-1]];
        for( abs($dist) ) {
            if( $_ < $best ) {
                $best= $_;
                @sol= @idx;
                return @weights[ @sol ]
                    if( 0 == $_ );
            }
        }
        if( 0 < $dist ) {
            push @idx, 1 + $idx[-1]
        } else {
            $dist += $weights[ $idx[-1]++ ];
        }
        while( @weights <= $idx[-1] ) {
            pop @idx;
            return @weights[ @sol ]
                if( 1 == @idx );
            $dist += $weights[ $idx[-1]++ ];
        }
    }
}

sub buk {
    my( $limit, $aRef ) = @_;
    my @in = sort{ $a <=> $b } @$aRef;
    my $target = sum( @in ) >> 1;
    my( $best, @best ) = 9e99;
    my $soFar = 0;
    my @half;
    for( 1 .. $limit ) {
        #print "$soFar : [@half] [@in] [@best]"; <>;
        $soFar += $in[ 0 ], push @half, shift @in
            while $soFar < $target;
        return( \@half, \@in ) if $soFar == $target;

        my $diff = abs( $soFar - $target );
        ( $best, @best ) = ( $diff, @half ) if $diff < $best;

        $soFar -= $half[ 0 ], push @in, shift @half
            while $soFar > $target;
        return( \@half, \@in ) if $soFar == $target;

        $diff = abs( $soFar - $target );
        ( $best, @best ) = ( $diff, @half ) if $diff < $best;

        @in = shuffle @in;
    }
    my %seen; $seen{ $_ }++ for @best;
    return \@best, [ grep !$seen{ $_ }--, @$aRef ];
}

sub tilly {
  my @numbers = sort {abs($b) <=> abs($a) or $a <=> $b} @_;

  # First we're going to find a "pretty good" partition.
  # If we can, we'll look for a partition that finishes off
  # like this one does.  That can short-cut the full
  # algorithm.
  my @in_partition;
  my $current_remaining = 0;
  for my $n (@numbers) {
    if ($current_remaining < 0) {
      if ($n > 0) {
        push @in_partition, 1;
        $current_remaining += $n;
      }
      else {
        push @in_partition, 0;
        $current_remaining -= $n;
      }
    }
    else {
      if ($n > 0) {
        push @in_partition, 0;
        $current_remaining -= $n;
      }
      else {
        push @in_partition, 1;
        $current_remaining += $n;
      }
    }
  }

  my $known_solution = $current_remaining;

  # Cheat, we're going to find out the extremes.
  my @max_sum_of_previous = 0;
  my $sum = 0;
  for my $n (@numbers) {
    $sum += abs($n);
    push @max_sum_of_previous, $sum;
  }

  # We're going to try to find partitions that add up to
  # each possible number that can be added up to.
  my $old;
  my $new = {0 => [[], []]};

  my $i = -1;
  my $answer;
N:  for my $n (@numbers) {
    $old = $new;
    $new = {};
    $i++;

    while (my ($key, $value) = each %$old) {
      if ($key == -$current_remaining) {
        # We've found our match!
        $answer = $value;
        last N;
      }

      if (
        abs($key)
          > $sum - $max_sum_of_previous[$i] + abs($known_solution)
      ) {
        # We're too far away from 0 to possibly beat the
        # "pretty good" partition.  So skip.
        next;
      }

      my ($p1, $p2) = @$value;
      $new->{$key + $n} ||= [[$n, $p1], $p2];
      $new->{$key - $n} ||= [$p1, [$n, $p2]];
    }

    # Adjust $current_remaining for the fact we're skipping
    # the $i'th element.
    if ($in_partition[$i]) {
      $current_remaining -= $n;
    }
    else {
      $current_remaining += $n;
    }
  }

  if (not $answer) {
    $i++; # We need to not append the tail!
    my $best = each %$new;

    while (my $difference = each %$new) {
      if (abs($difference) < abs($best)) {
        $best = $difference;
      }
    }

    $answer = $new->{$best};
  }

  # We need to flatten our nested arrays and add the tail.
  my ($p1, $p2) = @$answer;

  my @part_1;
  while (@$p1) {
    push @part_1, $p1->[0];
    $p1 = $p1->[1];
  }
  push @part_1
    , map {
        $in_partition[$_] ? $numbers[$_] : ()
      } $i..$#numbers;

  my @part_2;
  while (@$p2) {
    push @part_2, $p2->[0];
    $p2 = $p2->[1];
  }
  push @part_2
    , map {
        $in_partition[$_] ? () : $numbers[$_]
      } $i..$#numbers;

  return (\@part_1, \@part_2);
}
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: NP-complete sometimes isn't (A benchmark) Select or Download Code

Replies are listed 'Best First'.
Re^2: NP-complete sometimes isn't (A benchmark) by shmem (Chancellor) on Sep 02, 2008 at 10:27 UTC
But! it looks like FunkyMonk's code doesn't get the best partition always: Read more... (1005 Bytes) `perl funky.pl 400 402 521 735 758 191 307 679 776 877 Target is 2823 First container: sum(877 776 758 402) = 2813 Second container: sum(735 679 521 400 307 191) = 2833` [download] but a better partition is `First container: sum(400 402 521 735 758) = 2816 Second container: sum(191 307 679 776 877) = 2830` [download] update: ah, of course. That is what your "Difference" output field reveals....	[reply] [d/l] [select]
Re^3: NP-complete sometimes isn't (A benchmark) by BrowserUk (Patriarch) on Sep 02, 2008 at 11:06 UTC
Indeed, but I don't think that he ever made that claim? (By constrast, tilly (TTBOMK correctly) did.) And neither will mine, always. Much of the time it does. But, even with the same inputs, due to the semi-random nature of the algorithm, it occasionally will give up trying before it finds the optimum solution. I've never seen it be very far away from optimum, but it doesn't always find it within the specified iterations bound. But, given that it will most times partition a 1 million value dataset within 8 to 10 seconds: `Testing buk with 1000000 random values in the range 0 .. 1e4 Differ +ence:= 0; took 8.156250 seconds` [download] Where a brute force solution for 100 values would theoretically take around 17 years, the occasional imperfect solution is fair trade I reckon. ()I tried to use the same math to calculate the theoretical time for 1 million value dataset, but nothing I have access to can calculate `21e6` in a timeframe that I was prepared to wait for :) 2100 ~ 10e30; 21000 ~ 10e300; 210000 ~ 10e3000; 2100000 ~ 10e30000; So, by projection: 21000000 ~ 10e300000; Which if my sleep deprived brain isn't dissin' me, that represents something like 100,000 years of processing. Will I trade that for a 10 second response that's occasionally non-optimal. Sure :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: NP-complete sometimes isn't (A benchmark) by moritz (Cardinal) on Sep 02, 2008 at 11:21 UTC
()I tried to use the same math to calculate the theoretical time for 1 million value dataset, but nothing I have access to can calculate 21e6 in a timeframe that I was prepared to wait for :) here it is, if you happen to care about the answer. But if your goal is to just rewrite that number as a power of 10, you don't need to calculate it in the first place: `2 * $y == 10 $x log(2 $y) == log(10 ** $x) $y * log 2 == $x * log(10) $x == $y * log(2) / log(10)` [download] ... which also explains why your estimate is right ;-) Which if my sleep deprived brain isn't dissin' me, that represents something like 100,000 years of processing. More in the order of 10e300000 years. The number of seconds per year is about 3*10e7, so it's something like 10e299993 years. In comparison, the universe is about 10e10 years old. When you can do 10e10 calculations per second you're still at 10e299983.	[reply] [d/l]
Re^2: NP-complete sometimes isn't (A benchmark) by tilly (Archbishop) on Sep 03, 2008 at 17:16 UTC
My internet was down for a while, so I amused myself by optimizing my code a little. To be specific I only look at pairs of partitions with a positive difference, I moved the "break out early" checks out of the inner loop, and I switched from using hashes to arrays. Read more... (4 kB)	[reply] [d/l]
Re^3: NP-complete sometimes isn't (A benchmark) by BrowserUk (Patriarch) on Sep 04, 2008 at 08:29 UTC
Can I offer a suggestion. Okay, I'm going to offer a suggestion, whether you will have the time or inclination to do anything with it... :) I think it would greatly simplify and optimise your algorithm to avoid the conditionals associated with negative numbers. You can do that by applying a simple sort to the inputs and then adjusting the whole array to make it all positive (and undo it on output): `sub partition { my @numbers - sort{ $b <=> $a } @_; my $adjust = 0; if( $numbers[ -1 ] < 0 ) { $adjust = - $number[ -1 ]; $_ += $adjust for @numbers; } ... $_ -= $adjust for @part_1, @part_2; return \@part_1, \@part_2; }` [download] Note: I attempted to make this change and offer it back to you, but there is something about your algorithm that I am not understanding, because my attempts disappear up their own jackssies (sp?:). Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^4: NP-complete sometimes isn't (A benchmark) by tilly (Archbishop) on Sep 04, 2008 at 15:16 UTC
I thought of that and you can't do it. Suppose we have 10 numbers and the optimal partition has 6 in one side and 4 in the other. Then you've added 6$adjust to one side and 4$adjust to the other, and the partition is no longer going to look optimal. I did think about making all numbers be their absolute value, and then reverse which partition the negative numbers go in, but that logic looked more complicated and convoluted than the way I originally wrote it.	[reply]
Re^3: NP-complete sometimes isn't (A benchmark) by gone2015 (Deacon) on Sep 22, 2008 at 14:58 UTC
Sadly, I got interested in this problem. Although it is nominally O(2(N-1)), it does seem possible to do quite well for a lot less work -- and, the larger the set, the easier it appears to get !! So the trick with any decision problem is to prune the search space. tilly observed that although the search space is 2(N-1), the number of unique nodes may be a lot smaller. The other trick in the bag is to see if some heuristic can help us prune. As others have observed, the set can be partitioned quite effectively by: sorting the set into descending order assigning values in turn to the partition with the smaller sum at the time This 'hack' works wonderfully for sets containing numbers drawn from a range 1..n and pretty well for m..n -- and, the bigger the set, the better it works ! (I tested this against putting values into one partition until it just exceeds the perfect split -- starting with the set sorted in descending order and with random selection from the set. I even tried putting values into one partition until it just doesn't exceed the perfect split, and then finding the best remaining value. No better.) Looking at the result of the initial 'hack', one can see numbers that could be swapped between partitions to improve things. This 'refinement' is a linear scan down the two partitions -- a bit worse than linear if numbers can be exchanged. I tried these heuristics on 2000 sets of numbers drawn randomly from 1..99, 2000 sets drawn from 1..9999 and 100 from 9000..9999; for sets of length 31, 100 and 500: : : 2000 x 1..99 : 2000 x 1..9999 : 100 x 9000..9999 : : : perf.% av. delta : perf.% av. delta : perf.% av. delta : :------------:-------------------:-------------------:-------------------- : 31: hack : 44.4% +1.117 : 3.0% +131.786 : 0.0% +4500.720 : : refine: 97.7% +0.024 : 3.5% +25.639 : 0.0% +1435.640 : : best : 100.0% +0.000 : 100.0% +0.000 : 0.0% +1015.820 : :------------:-------------------:-------------------:-------------------- : 100: hack : 84.5% +0.174 : 3.0% +38.562 : 13.0% +4.640 : : refine: 100.0% +0.000 : 30.8% +2.292 : 95.0% +0.050 : : best : 100.0% +0.000 : 100.0% +0.000 : 100.0% +0.000 : :------------:-------------------:-------------------:-------------------- : 500: hack : 100.0% +0.000 : 9.7% +7.560 : 50.0% +0.750 : : refine: 100.0% +0.000 : 99.9% +0.001 : 100.0% +0.000 : : best : 100.0% +0.000 : 100.0% +0.000 : 100.0% +0.000 : where the `perf%` is the percentage of perfectly partitioned sets after the given operation, and the `av. delta` is the average distance from perfect, over all sets. The 'best' line shows the best possible outcome. Note that the longer the set: the more likely there is to be a perfect partition, and the more likely the 'hack' and 'refine' operations are to finding one !! Having got that far, how does one search for the best possible partition ? Looking at the problem as a tree walk, what I tried was: start at the node identified by the initial 'hack' & 'refine', and work back up the tree (making sure that all nodes are considered exactly once). discard parts of the tree that whose minimum result is greater than the best so far, or whose maximum result is less than the best so far -- minimax. discard parts of the tree which have already been visited with the same partition total -- similar to the trick used in 708384 build a table for all combinations of the last 'n' numbers in the set, which short circuits the search of the lowest levels of the tree. Building the table is not free, so it's built progressively, when there appears to be a need. limit the search, so that doesn't disappear for ever ! This turned out to quite tricky for a bear-of-little-brain, and, with all the debug and test stuff, runs to 2800 lines. Anyway, I ran this on the same test data, and collected the average running time per partition operation: : : 2000 x 1..99 : 2000 x 1..9999 : 100 x 9000..9999 : :-----:-------------------:-------------------:-------------------- : 31 : GMCH 127.20 uSec : GMCH 4.26 mSec : GMCH 615.67 mSec : : : tilly 32.18 mSec : tilly 2.55 Secs : tilly 6.34 Secs : :-----:-------------------:-------------------:-------------------: : 100 : GMCH 298.18 uSec : GMCH 9.59 mSec : GMCH 733.82 uSec : : : tilly 606.84 mSec : tilly 65.82 Secs : tilly 109.46 mSec : :-----:-------------------:-------------------:-------------------: : 500 : GMCH 1.76 mSec : GMCH 1.87 mSec : GMCH 1.84 mSec : : : tilly 19.89 Secs : tilly *** : tilly * : where 'GMCH' is my code, 'tilly' is the code given in 708384, uSec is micro-Secs and mSec is milli-secs. The '***' is where I terminated the test, because it had run out of memory and started to thrash. The code is available here http://www.highwayman.com/perls/part_GMCH_v1.14.pm I can post it here if y'all like. What has surprised me is that this works as well as it does, particularly as the set gets bigger ! I wonder if I just haven't found the right test data :-( I'd be pleased to hear of stuff that this won't cope with. For completeness, I'm only handling +ve values. To cope with -ve values the minimax and other pruning has to be modified to do different things depending on which side of the perfect partition one is on at any time.	[reply] [d/l] [select]
Re^4: NP-complete sometimes isn't (A benchmark) by tilly (Archbishop) on Sep 23, 2008 at 05:33 UTC
That's a nice approach. If you want to extend it to negative numbers, it will be less work than you think if you use the right hack. As far as the difference is concerned, having, say, -16 in one partition is exactly the same as having 16 in the other one. So flip all of your signs to positive, then when you've found the solution, delete the appropriate positive ones from one partition while inserting negatives in the other. Voila! Rather than a pervasive logic change you just have to pre-process the list and post-process your answer.	[reply]
Re^5: NP-complete sometimes isn't (A benchmark) by Pepe (Sexton) on Sep 24, 2008 at 23:43 UTC