spencoid has asked for the wisdom of the Perl Monks concerning the following question:

I have a perl script to which I need to add a capability that is well beyond my current thinking ability. I am not even sure where to look for suggestions. I will try to state the problem and would like any suggestions as to where to start in coding or where to look for information on solving this sort of problem. It is sort a big puzzle. I have a number of sizes of weights, for argument purposes (in actuality they will be decimal values with many places), say they are .01 .12 1.1 2.2 5.3 10.5 50.1 100.6 gram. each weight will also have a known precision but that complexity will only be used in the program once I figure out the basics. I need to write a program that calculates the smallest total number of weights to use to obtain a total that is as close to a desired as possible. If it were all in whole numbers it would be trivial. The reason I can not just use a huge number of the smallest weight is because each weight has an allowable error and if a huge number is used, the possible error also gets huge. My current program demonstrates this very well. I have a script that does the calculations with the actual weights i will be using but the user has to use a trial and error process to attempt to get the closest match with the best precision and it involves lots fiddling and no assurance of the best solution. I could probably keep playing with the script and maybe get a clue as to what my thinking is, as a relatively intelligent human, and try to write a program to mimic what I have found to be a good approach but i think i would be using intuition and have no idea how to program that. I could probably do it by brute force but also do not know how to approach that either.

Replies are listed 'Best First'.
Re: optimization problem (Knapsack)
by LanX (Saint) on Jun 10, 2018 at 20:21 UTC
    > I am not even sure where to look for suggestions

    Sounds like Knapsack problem and family ... but I'm not sure about your side conditions.

    edit

    ah sounds good:

    Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

    So pick a negative value of -1 or negative number representing the precision.

    You'll find plenty of implementations

    update

    ... though be aware that it's NP hard, runtime will depend on how good your result has to be and if you need a guarantied optimal solution.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Thanks for the hints. I does seem like a type of napsack problem or a change problem however there are smoe specifics that make it a little more complex, maybe. Also I did some fiddling with the script to try to see what strategy I owould use as a human and try to determine if I could use that as a basis from which to develop code. The weights do (sort of) have two attributes, weight and precision. Precision could probably be equated with value but I really can not wrap my head around this one :) The weight examples I gave were not representative of the true problem. The weights do run from smallest to largest and the values are 5 or 6 decimal places. A strategy I used fairly well to do it by hand is very simple but ... As in the example I found online of the coin change problem you can add the largest number of the largest number of weights and add that to the total as long as the total does not exceed the target value. Continue with lesser weights and finally add the number of the smallest weights. The smallest weight is quite small and there are a lot of intermediates so it might make sense to just keep it as simple as possible and not worry about the perfect solution. Once the weights are chosen by this method, it is never possible to substitute more larger weights for smaller ones, of course but as in the coin change problem it might be possible to get closer with more smaller weights if the remainder is then satisfied with fewer even smaller coins. I can very easily code the simplistic solution and fiddle around to see how much improvement might be possible. I might even get an idea as to how to test for a better solution once the simple method is used. I have not used much math in many years and could not follow most of the online examples. There was a python example for the syntax is so different from Perl or any language I have used that it would take me forever to figure out. I will search for Perl coin change problem and similar topics once my satellite internet connection improves after 2:00 AM but if anyone has links for examples of the coin change or napsack problems written in Perl that would be very much appreciated.
Re: optimization problem
by haukex (Archbishop) on Jun 10, 2018 at 21:40 UTC
    My current program demonstrates this very well.

    Could you show us that program, or at least a short version of it? (SSCCE) Also, this sounds like a case where you might benefit from a test-first approach: even if you don't have the body of your function yet, you can still write tests using Test::More to test this hypothetical function. Not only will it help you see how close you're getting while you're developing, it'll help us understand your problem description better :-)

    use warnings; use strict; use Test::More; sub my_func { my $target = shift; my @weights = @_; my @output = (); # TODO! return \@output; } is_deeply my_func(30, 7,10,13), [10,10,10]; is_deeply my_func( 5, 3,7,11), [3,3]; # TODO: lots more test cases! done_testing;
    If it were all in whole numbers it would be trivial.

    In my experience integers are much easier to work with, and you don't suffer from floating-point imprecisions. I'd suggest switching to milligrams, micrograms, etc. - for example, my Perl (64 bit) can handle integers up to 9007199254740992 without loss of precision. (source)

Re: optimization problem
by BrowserUk (Patriarch) on Jun 11, 2018 at 00:58 UTC
    I could probably do it by brute force but also do not know how to approach that either.

    Let's assume for a moment that the problem was to use any of your (small, but for discussion only) test set at most once each to achieve the target value. With such a small test set, and the once-only criteria, the problem is simple and fast:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000; use List::Util qw[ sum ]; use Algorithm::Combinatorics qw[ combinations ]; use Time::HiRes qw[ time ]; =comment I have a number of sizes of weights, for argument purposes, say they a +re .01 .12 1.1 2.2 5.3 10.5 50.1 100.6 gram. Each weight will also have a known precision but that complexity will +only be used in the program once I figure out the basics. I need to write a program that calculates the smallest total number of + weights to use to obtain a total that is as close to a desired as po +ssible. =cut our $M //= 1; my $start = time; my %results; my @vals = ( qw[ 0.01 0.12 1.1 2.2 5.3 10.5 50.1 100.6 ] ) x $M; for my $k ( 1 .. $#vals ) { my $iter = combinations( \@vals, $k ); while( $_ = $iter->next ) { push @{ $results{ sum @$_ } }, [ @$_ ]; } } printf "Finding all %d results took %6.3f seconds\n", scalar( keys %re +sults ), time() - $start; printf "<Enter> to see them all"; <STDIN>; pp \%results; __END__ C:\test>1216334 -M=1 Finding all 254 results took 0.004 seconds <Enter> to see them all

    The results:

    Now let's loosen my additional criteria and say that any of your values can be used twice. Here's the headline statistic (the full data is not interesting):

    C:\test>1216334 -M=2 Finding all 4817 results took 1.284 seconds <Enter> to see them allTerminating on signal SIGINT(2)

    Note how the number combinations has risen from 254 to 4817, a 19x increase; but they significant thing is the time taken. And increase from 0.004 seconds to 1.284 seconds, a 321x increase.

    So, extrapolating for -M=3 (ie. each value can be used at most 3 times) we get: No of results: 4817 * 19 = 91,523; and the time would be: 1.284 x 321 = 6.8694 seconds.

    However, when we run the code we get:

    C:\test>1216334 -M=3 Finding all 16777214 results took 173.459 seconds

    4817 to 16,777,214 is a 3483x increase. (And don't get excited about the 173 seconds; a 16.7 million key hash of 7 element arrays requires enough memory to push my 16GB machine into swapping, so I commented out the bit that stored the results.


    The point here is that the growth in size and time is not geometric (multiply|); nor even exponential(powers); but geometrically exponential. It grows veryn * m quickly.

    Imagine that the tolerence on a 0.01 part was, percentage-wise, fractionally less than the tolerance on a 100.6 part. In other words, to accumulate a total of 100.61, it would be better (according to your tolerance criteria) to use 10061 x 0.01 parts, than to use 1 x a 100.6 part + 1 x a 0.01 part.

    If that were the case, then to brute force a solution, would require you to allow each of your 7 values to be (re)used upto 10,061 times each.

    Now, if -M=3 would generate 16.7 million possibilities, imagine the size of the numbers that -M=10,061 would generate! (And the time required to generate them all.)

    At this point you may be thinking this is impossible.

    But ... the saving grace may be those tolerances.

    If you told us the maximum final tolerance, and the per part tolerances, then we could limit the number of each individual part, by simply dividing the maximum final tolerance by the tolerance per part. If the tolerance values are real(istic), rather than faux values chosen to deliberately complicate the solution, then we could use those tolerances to constrain the per component maximum reuse counts such that the brute force time/memory requirements would be within the bounds of reality.

    Bottom line: if this is a real problem you are needing to solve -- rather than an academic exercise design to frustrate solution -- then if you supply us with the full criteria, a reasonable (time/accuracy/machine specification) solution might be possible.

    However, if this is an academic exercise design to create a plausible, but ultimately insoluble, problem; then all you are doing by posting it, is wasting all our time.

    And that is the ultimate sin on a site that prides itself on providing solutions, rather than airy-fairy illusions to possible academic approaches to the problem, without either code, nor even an assessment of the probability that such academic objet'd'art might lead to an at least feasible, if not practical, solution.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: optimization problem
by Laurent_R (Canon) on Jun 10, 2018 at 22:53 UTC
    First, I don't think it would be trivial with whole numbers.

    Second, it is relatively easy to solve such a problem for a (very) small number of weights, but it is known to be a very difficult problem when the number of weights grows larger. In fact, most computer scientists believe that there is probably no workable solution for just a few hundred random values (brute force would probably take billions of years to calculate because of an exponential explosion). In other words, the ability to solve such a problem depends very much on your input data, especially on the number of weights that you have (and also on the data shape). Please provide as much detailed information on the input data as you can (especially the maximum number of values).

      It depends if the OP needs an optimal or just a somehow "good" solution.

      He didn't provide much details.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Yes, granted, it depends on that, as well as on a number of other factors. I was really saying that a general solution with a large random data set can be extremely difficult or even next to impossible. But, depending on the data, there may be ways to considerably reduce the complexity, for example if it is possible to discard early a large proportion of the would-be solutions. But we need much more information to be able to decide on whether it is possible to work out such a strategy.
Re: optimization problem
by bliako (Abbot) on Jun 11, 2018 at 11:45 UTC

    Sometimes I am a one-trick-pony, so I scripted a quick Genetic Algorithms solution to this problem based on something I posted earlier about solving sets of linear equations, in thread Curious about Perl's strengths in 2018. My solution uses the good module Algorithm::Evolutionary

    Essentially the program tries to find an optimum combination of base weights so that a target sum weight is reached. It also tries to minimise the number of base weights used but does not take into account tolerances at the moment because it is no use me guessing the spec when the poster can clarify. So for the time being set IMPORTANCE_OF_MINIMISING_NUM_WEIGHT_USED_OVER_DISCREPANCY to reflect this tradeoff. Less favours target sum.

    It works with small (4) and large number of weights (17) in less than 10 seconds in my machine. Running time depends on how strict the stopping criterion is and what is the maximum number of iterations in case the criterion is not reached. Memory requirement is very low but it depends on the size of GA population.

    Genetic Algorithms (GA) may or may not work. It all comes down to the vastness of the search space, the number of dimensions. For this particular case I find that GA work superbly.

    A gene is a bitstring (8 bits for my program) and represents a weight multiplier (which means it will be in the range 0 to 255). A chromosome is a set of N genes for N base weights and represents a candidate solution. GA keeps a number of chromosomes as a number of candidate solutions in the genetic pool so-to-speak and assesses their fitness using a user-supplied function. This function basically calculates the discrepancy of weight sum and target and modulates that to decrease when error gets smaller. GA then recombines (lets them have lots of sex basically) chromosomes to get a new solution based on parents' genes. And so on.

    Notice how that genes are just bits: 0/1. A solution is just bits. The interpretation of these bits to numbers for particular problem is up to the user who supplies the fitness function (given the genes). This is certainly a big plus for GA, often ovelooked, because one can immediately solve integer-only problems (i.e. here the multipliers must be integers). Or unsigned-integer-only or floats, fractions etc. And then we have strings etc. (side-project: make a markov chain text generator which scores high on sentiment analysis.)

    Here is my script. It runs without parameters:

    bw, bliako