spencoid has asked for the wisdom of the Perl Monks concerning the following question:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: optimization problem (Knapsack)
by LanX (Saint) on Jun 10, 2018 at 20:21 UTC | |
Sounds like Knapsack problem and family ... but I'm not sure about your side conditions.
editah sounds good: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. So pick a negative value of -1 or negative number representing the precision. You'll find plenty of implementations
update... though be aware that it's NP hard, runtime will depend on how good your result has to be and if you need a guarantied optimal solution.
Cheers Rolf
| [reply] |
by spencoid (Acolyte) on Jun 10, 2018 at 23:34 UTC | |
| [reply] |
|
Re: optimization problem
by haukex (Archbishop) on Jun 10, 2018 at 21:40 UTC | |
My current program demonstrates this very well. Could you show us that program, or at least a short version of it? (SSCCE) Also, this sounds like a case where you might benefit from a test-first approach: even if you don't have the body of your function yet, you can still write tests using Test::More to test this hypothetical function. Not only will it help you see how close you're getting while you're developing, it'll help us understand your problem description better :-)
If it were all in whole numbers it would be trivial. In my experience integers are much easier to work with, and you don't suffer from floating-point imprecisions. I'd suggest switching to milligrams, micrograms, etc. - for example, my Perl (64 bit) can handle integers up to 9007199254740992 without loss of precision. (source) | [reply] [d/l] [select] |
|
Re: optimization problem
by BrowserUk (Patriarch) on Jun 11, 2018 at 00:58 UTC | |
I could probably do it by brute force but also do not know how to approach that either. Let's assume for a moment that the problem was to use any of your (small, but for discussion only) test set at most once each to achieve the target value. With such a small test set, and the once-only criteria, the problem is simple and fast:
The results:
Now let's loosen my additional criteria and say that any of your values can be used twice. Here's the headline statistic (the full data is not interesting):
Note how the number combinations has risen from 254 to 4817, a 19x increase; but they significant thing is the time taken. And increase from 0.004 seconds to 1.284 seconds, a 321x increase. So, extrapolating for -M=3 (ie. each value can be used at most 3 times) we get: No of results: 4817 * 19 = 91,523; and the time would be: 1.284 x 321 = 6.8694 seconds. However, when we run the code we get:
4817 to 16,777,214 is a 3483x increase. (And don't get excited about the 173 seconds; a 16.7 million key hash of 7 element arrays requires enough memory to push my 16GB machine into swapping, so I commented out the bit that stored the results. The point here is that the growth in size and time is not geometric (multiply|); nor even exponential(powers); but geometrically exponential. It grows veryn * m quickly. Imagine that the tolerence on a 0.01 part was, percentage-wise, fractionally less than the tolerance on a 100.6 part. In other words, to accumulate a total of 100.61, it would be better (according to your tolerance criteria) to use 10061 x 0.01 parts, than to use 1 x a 100.6 part + 1 x a 0.01 part. If that were the case, then to brute force a solution, would require you to allow each of your 7 values to be (re)used upto 10,061 times each. Now, if -M=3 would generate 16.7 million possibilities, imagine the size of the numbers that -M=10,061 would generate! (And the time required to generate them all.) At this point you may be thinking this is impossible. But ... the saving grace may be those tolerances. If you told us the maximum final tolerance, and the per part tolerances, then we could limit the number of each individual part, by simply dividing the maximum final tolerance by the tolerance per part. If the tolerance values are real(istic), rather than faux values chosen to deliberately complicate the solution, then we could use those tolerances to constrain the per component maximum reuse counts such that the brute force time/memory requirements would be within the bounds of reality. Bottom line: if this is a real problem you are needing to solve -- rather than an academic exercise design to frustrate solution -- then if you supply us with the full criteria, a reasonable (time/accuracy/machine specification) solution might be possible. However, if this is an academic exercise design to create a plausible, but ultimately insoluble, problem; then all you are doing by posting it, is wasting all our time. And that is the ultimate sin on a site that prides itself on providing solutions, rather than airy-fairy illusions to possible academic approaches to the problem, without either code, nor even an assessment of the probability that such academic objet'd'art might lead to an at least feasible, if not practical, solution. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] [d/l] [select] |
|
Re: optimization problem
by Laurent_R (Canon) on Jun 10, 2018 at 22:53 UTC | |
Second, it is relatively easy to solve such a problem for a (very) small number of weights, but it is known to be a very difficult problem when the number of weights grows larger. In fact, most computer scientists believe that there is probably no workable solution for just a few hundred random values (brute force would probably take billions of years to calculate because of an exponential explosion). In other words, the ability to solve such a problem depends very much on your input data, especially on the number of weights that you have (and also on the data shape). Please provide as much detailed information on the input data as you can (especially the maximum number of values). | [reply] |
by LanX (Saint) on Jun 11, 2018 at 10:02 UTC | |
He didn't provide much details.
Cheers Rolf
| [reply] |
by Laurent_R (Canon) on Jun 11, 2018 at 18:01 UTC | |
| [reply] |
|
Re: optimization problem
by bliako (Abbot) on Jun 11, 2018 at 11:45 UTC | |
Sometimes I am a one-trick-pony, so I scripted a quick Genetic Algorithms solution to this problem based on something I posted earlier about solving sets of linear equations, in thread Curious about Perl's strengths in 2018. My solution uses the good module Algorithm::Evolutionary Essentially the program tries to find an optimum combination of base weights so that a target sum weight is reached. It also tries to minimise the number of base weights used but does not take into account tolerances at the moment because it is no use me guessing the spec when the poster can clarify. So for the time being set IMPORTANCE_OF_MINIMISING_NUM_WEIGHT_USED_OVER_DISCREPANCY to reflect this tradeoff. Less favours target sum. It works with small (4) and large number of weights (17) in less than 10 seconds in my machine. Running time depends on how strict the stopping criterion is and what is the maximum number of iterations in case the criterion is not reached. Memory requirement is very low but it depends on the size of GA population. Genetic Algorithms (GA) may or may not work. It all comes down to the vastness of the search space, the number of dimensions. For this particular case I find that GA work superbly. A gene is a bitstring (8 bits for my program) and represents a weight multiplier (which means it will be in the range 0 to 255). A chromosome is a set of N genes for N base weights and represents a candidate solution. GA keeps a number of chromosomes as a number of candidate solutions in the genetic pool so-to-speak and assesses their fitness using a user-supplied function. This function basically calculates the discrepancy of weight sum and target and modulates that to decrease when error gets smaller. GA then recombines (lets them have lots of sex basically) chromosomes to get a new solution based on parents' genes. And so on. Notice how that genes are just bits: 0/1. A solution is just bits. The interpretation of these bits to numbers for particular problem is up to the user who supplies the fitness function (given the genes). This is certainly a big plus for GA, often ovelooked, because one can immediately solve integer-only problems (i.e. here the multipliers must be integers). Or unsigned-integer-only or floats, fractions etc. And then we have strings etc. (side-project: make a markov chain text generator which scores high on sentiment analysis.) Here is my script. It runs without parameters: Read more... (17 kB)
bw, bliako | [reply] [d/l] [select] |