Re: "Divide" challenge

I decided to check to see if a brute force search scales well, so I've written a little script that generates an arbitrary matrix of N elements, solves it by brute force, and spits out the lapsed time. The conclusion? Even on a modern machine, a brute force solution doesn't scale particularly well after about 20 nodes. Here are the results from the script for 4,8,16,20 and 32 nodes:

Number of nodes: 4
Number of combinations tried: 3
solution: <0 2> <1 3>
dataflow: 176 lapsed time=0 s
--------------------------
Number of nodes: 8
Number of combinations tried: 35
solution: <0 2 3 6> <1 4 5 7>
dataflow: 460 lapsed time=0.01 s
--------------------------
Number of nodes: 16
Number of combinations tried: 6435
solution: <0 5 6 7 9 12 13 15> <1 2 3 4 8 10 11 14>
dataflow: 2722 lapsed time=0.860013 s
--------------------------
Number of nodes: 20
Number of combinations tried: 92378
solution: <0 1 2 4 5 8 10 11 12 18> <3 6 7 9 13 14 15 16 17 19>
dataflow: 3658 lapsed time=30.240456 s
--------------------------
Number of nodes: 32
Number of combinations tried: 300540195
solution: <0 3 4 6 9 12 14 18 19 22 23 25 26 27 28 31>
<1 2 5 7 8 10 11 13 15 16 17 20 21 24 29 30>
dataflow: 10701 lapsed time=116933.583916 s (32.48 hrs)
[download]

To make this a fair comparison, I wrote a combo generator that takes advantage of the structural points (order doesn't matter, we only need to look at half the combos) discussed in my response to Bellaire above. I also eliminated all but one call to grep per combo and removed the recursion.

The script used to generate this takes two parameters: scriptname node_count show_matrix where node_count is the number of nodes and show_matrix causes the matrix to be printed out if it is set to any true value. The code is

use strict;
use warnings;
use Time::HiRes qw(gettimeofday);

my $N= $#ARGV >=0 ? $ARGV[0] : 8;
my $iMax=100;
my $bPrintMatrix = $#ARGV >= 1 ? $ARGV[1] : 0;

my $g=makeMatrix($N, $iMax);
minimizeDataFlow($N, $iMax, $g, $bPrintMatrix);
print "--------------------------\n";


#------------------------------------------------------------

sub minimizeDataFlow {
  my ($N, $iMax, $aMatrix, $bPrintMatrix) = @_;
  my $iLast = $N-1;
  my $iHalf = int($N/2);
  my $iMin=$iMax*$iHalf*$iHalf;
  my $aSolution = [];
  my $iComboCount = 0;

  my $crCombo = sub {
    my @aFirst = @{shift @_};
    my %hFirst = map { $_ => 1 } @aFirst;
    my @aSecond = grep { ! exists($hFirst{$_}) } (0..$iLast);

    my $iDataFlow = calcDataFlow($aMatrix, \@aFirst, \@aSecond);
    #print "group1: <@aFirst> group2: <@aSecond> "
    #  ."dataflow=$iDataFlow min=$iMin\n";

    if ($iDataFlow < $iMin) {
      $iMin = $iDataFlow;
      $aSolution = [ \@aFirst, \@aSecond ];
    };
    $iComboCount++;
  };


  printMatrix($aMatrix) if $bPrintMatrix;

  my $t0 = [ gettimeofday ];
  visitCombos($N, $crCombo, 0);
  my $t1 = [ gettimeofday ];
  my $tLapsed = [ $t1->[0] - $t0->[0], $t1->[1] - $t0->[1] ];
  my $iLapsed = (1000000*$tLapsed->[0] + $tLapsed->[1])/1000000;
  my $aFirst = $aSolution->[0];
  my $aSecond = $aSolution->[1];

  print "Number of nodes: $N\n";
  print "Number of combinations tried: $iComboCount\n";
  print "solution: <@$aFirst> <@$aSecond>\n";
  print "dataflow: $iMin lapsed time=$iLapsed s\n";
}

#------------------------------------------------------------

sub visitCombos {
  my ($N, $crCombo, $bPermutation) = @_;
  $bPermutation = 1 unless defined($bPermutation);
  my $iLast = $N-1;
  my $iHalf=int($iLast/2);
  my @aCombo;
  my $iCombo=0;
  my $i=0;
  my $aNext = [];
  my $hUsed={};

  for (0..$iHalf) { $aNext->[$_]=0; };
  while (1) {
    # find next value for combo element $i
    # if there are no more values left at $i, go back 1.

    my $iChosen = $aNext->[$i];
    my $iMinChosen;

    if ($bPermutation) {
      while (exists $hUsed->{$iChosen}) { $iChosen++ };
    } else {
      $iMinChosen = $i ? $aNext->[$i-1] : 0;
      $iChosen = $iMinChosen if ($iChosen < $iMinChosen);
    }
    my $iLastForI= $i ? $iLast : 0;
    #print "last: $iLastForI\n";

    while ($iChosen > $iLastForI) {
      return unless ($i > 0);
      $aNext->[$i] = 0;
      $i--;
      $iChosen = $aNext->[$i];
      $iLastForI=$i ? $iLast :0;
      #print "last: $iLastForI\n";

      if ($bPermutation) {
        delete $hUsed->{$iChosen - 1};
        while (exists $hUsed->{$iChosen}) { $iChosen++ };
      } else {
        $iMinChosen = $i ? $aNext->[$i-1] : 0;
        $iChosen = $iMinChosen if ($iChosen < $iMinChosen);
      }
    }

    # set combo element at position $i
    $aCombo[$i]=$iChosen;
    $hUsed->{$iChosen} = 1 if $bPermutation;
    $aNext->[$i] = $iChosen + 1;

    # if more elements are needed, advance to next element
    if ($i < $iHalf) { $i++; next; };

    # process completed combo
    &$crCombo(\@aCombo);

    # all elements have been assigned, move onto the next
    # combination
    delete $hUsed->{$iChosen};
  }
}

#------------------------------------------------------------

sub calcDataFlow {
  my ($aRows, $aGroup1, $aGroup2) = @_;
  my $iDataFlow=0;
  for my $x (@$aGroup2) {
    for my $y (@$aGroup1) {
      my $aRow=$aRows->[$x];
      $iDataFlow += $aRows->[$x][$y];
    }
  }
  return $iDataFlow;
}

#------------------------------------------------------------

sub makeMatrix {
  my ($N, $iMax) = @_;
  my $iLast=$N-1;
  my @f;

  #creates NxN matrix with cells where row=col undefined
  for my $x (0..$iLast) {
    for my $y ($x+1..$iLast) {
      $f[$x][$y]=$f[$y][$x]=int(rand($iMax));
    }
  }
  $f[$iLast][$iLast]=undef;  #force addition of row=col=$N

  return \@f;
}

#------------------------------------------------------------

sub printMatrix {
  my ($g) = shift;
  for (@$g) {
    my $sRow = join
      (' ', map { sprintf("%2s", defined($_) ? $_ : '-')} @$_);
    print "$sRow\n";
  }
}
[download]

Best, beth

Comment on Re: "Divide" challenge Select or Download Code

Replies are listed 'Best First'.
Re^2: "Divide" challenge by JavaFan (Canon) on Mar 11, 2009 at 09:47 UTC
Well, it scales like N! which is exponential. Which doesn't scale well at all. It basically means that if your CPU speed doubles, you can increase your problem space by 1 to get an answer in the same time. And I don't think you can avoid a brute force approach. This sounds very much like an NP complete problem.	[reply]
Re^3: "Divide" challenge by ELISHEVA (Prior) on Mar 11, 2009 at 10:41 UTC
I don't think you can avoid a brute force approach Although I agree that the general problem is likely NP complete, there are probably at least some things one might do,singly or in combination, to pick out more or less likely solutions (and hence improve the likelihood that the first of N solutions contains something close to the optimal solution - i.e. we limit he number of solutions tried to N since we don't have infinite computing resources). I don't have the time to really think these through (or relearn my linear algebra), but someone else might like to pick up the ball on these ideas: There are probably some special groups of data-flow matrices that could be quickly solved analytically by simplifying the dataflow matrix using using eigenvectors and other linear algebra techniques. It might help to calculate the distribution of data flow values and use the distribution to prioritize which combinations of rows and columns should be tried first. Best, beth Update: in response to JavaFan's "I don't see how that could help" note below, added clarification of reason for picking more or less likely solutions.	[reply]
Re^4: "Divide" challenge by JavaFan (Canon) on Mar 11, 2009 at 11:02 UTC
It might help to calculate the distribution of data flow values and use the distribution to prioritize which combinations of rows and columns should be tried first. I don't see how that can help. It may help if you can eliminate large groups of possibilities, but as long as you have to try them all, it doesn't matter is which order you do them.	[reply]
Re^5: "Divide" challenge by ELISHEVA (Prior) on Mar 11, 2009 at 11:50 UTC
Re^3: "Divide" challenge by JavaFan (Canon) on Mar 11, 2009 at 10:56 UTC
Here's how it "scales". First column is the number of machines. Second column is the number of times you can divide that number of machines in two equally sized groups: use Math::BigInt; use 5.010; my $line = "+----+------------------------+"; say $line; printf "\| %2d \| %22s \|\n", 2 * $_, Math::BigInt->new(2 * $_)->bfac / (2 * Math::BigInt->new($_)->bfac +) for 1 .. 16; say $line; __END__ +----+------------------------+ \| 2 \| 1 \| \| 4 \| 6 \| \| 6 \| 60 \| \| 8 \| 840 \| \| 10 \| 15120 \| \| 12 \| 332640 \| \| 14 \| 8648640 \| \| 16 \| 259459200 \| \| 18 \| 8821612800 \| \| 20 \| 335221286400 \| \| 22 \| 14079294028800 \| \| 24 \| 647647525324800 \| \| 26 \| 32382376266240000 \| \| 28 \| 1748648318376960000 \| \| 30 \| 101421602465863680000 \| \| 32 \| 6288139352883548160000 \| +----+------------------------+ [download]	[reply] [d/l]
Re^2: "Divide" challenge by bellaire (Hermit) on Mar 11, 2009 at 12:09 UTC
With respect to scaling, I've tried some simple analysis to see what we're dealing with in terms of optimization. Using ELISHEVA's matrix generation code (and her observation that my code was doing twice as much work as necessary), I ran many random iterations asking the question: How much of an improvement is the optimal solution over the average of all possible solutions? I then decided to try an utterly simplistic solution algorithm that should scale as N^2 with respect to the number of machines (N), or linearly with respect to the number of data flow paths. That is, I simply pick N^2 combinations completely at random, and select the minimum cost from that set. I observed the improvement of this approach over the average as well. Here's my output for 100 iterations for N from 8 to 14: `N=8 Over 100 iterations, optimal is better than average by 21.07 percent random is better than average by 20.61 percent N=10 Over 100 iterations, optimal is better than average by 20.38 percent random is better than average by 18.64 percent N=12 Over 100 iterations, optimal is better than average by 19.35 percent random is better than average by 16.82 percent N=14 Over 100 iterations, optimal is better than average by 18.00 percent random is better than average by 15.08 percent` [download] From this data, there are two immediate assertions that I can make, even though my data set is still too small to be totally conclusive: The improvement of the optimal solution over the average decreases as N increases. The improvement of the random solution decreases as well, at a slightly higher rate than the optimal solution. However, if I had to go out on a limb, I think that the randomized solution, because it scales well, might be an acceptable substitute for brute force, especially as N grows larger and the brute force solution becomes prohibitive. At the very least, any heuristic approach would have to be at least as good as the random approach to be a serious candidate.	[reply] [d/l]