With respect to scaling, I've tried some simple analysis to see what we're dealing with in terms of optimization. Using
ELISHEVA's matrix generation code (and her observation that my code was doing twice as much work as necessary), I ran many random iterations asking the question:
How much of an improvement is the optimal solution over the average of all possible solutions? I then decided to try an utterly simplistic solution algorithm that should scale as N^2 with respect to the number of machines (N), or linearly with respect to the number of data flow paths. That is, I simply pick N^2 combinations completely at random, and select the minimum cost from that set. I observed the improvement of this approach over the average as well. Here's my output for 100 iterations for N from 8 to 14:
N=8
Over 100 iterations, optimal is better than average by 21.07 percent
random is better than average by 20.61 percent
N=10
Over 100 iterations, optimal is better than average by 20.38 percent
random is better than average by 18.64 percent
N=12
Over 100 iterations, optimal is better than average by 19.35 percent
random is better than average by 16.82 percent
N=14
Over 100 iterations, optimal is better than average by 18.00 percent
random is better than average by 15.08 percent
From this data, there are two immediate assertions that I can make, even though my data set is still too small to be totally conclusive:
- The improvement of the optimal solution over the average decreases as N increases.
- The improvement of the random solution decreases as well, at a slightly higher rate than the optimal solution.
However, if I had to go out on a limb, I think that the randomized solution, because it scales well, might be an acceptable substitute for brute force, especially as N grows larger and the brute force solution becomes prohibitive. At the very least, any heuristic approach would have to be at least as good as the random approach to be a serious candidate.