in reply to Average Price Algorithm

camelcom,
The following heuristic approach does amazingly well with the given data set. It can be improved further by not calculating the last fragment since whatever is left must go in it. I left it the way it is in case it was possible that the total of all fragments was less than the total quantity.

The algorithm works as follows:

The binary search could be improved and some of the math is duplicated so there are speed improvements to be had. There may also be bugs to be squashed as I wrote it in a hurry.

#!/usr/bin/perl use strict; use warnings; use List::Util 'sum'; my @data; while (<DATA>) { chomp; my ($quantity, $cost) = split /\s*@\s*/; push @data, ($cost) x $quantity; } @data = sort {$a <=> $b} @data; my %fragment = ( A => {count => 65, ave => 0, items => 0}, B => {count => 12, ave => 0, items => 0}, C => {count => 24, ave => 0, items => 0}, D => {count => 19, ave => 0, items => 0}, ); my $tgt_ave = sum(@data) / @data; for my $frag (sort {$fragment{$a}{count} <=> $fragment{$b}{count}} key +s %fragment) { for (1 .. $fragment{$frag}{count}) { my ($cnt, $ave) = @{$fragment{$frag}}{qw/items ave/}; my $best = ($tgt_ave * $cnt) + $tgt_ave - ($ave * $cnt); my $idx = find_best(\@data, $best, $ave, $tgt_ave, $cnt); my $val = splice(@data, $idx, 1); #push @{$fragment{$frag}{val}}, $val; ++$fragment{$frag}{items}; $fragment{$frag}{ave} = (($ave * $cnt) + $val) / ($cnt + 1); } } use Data::Dumper; print Dumper(\%fragment); # if not exact match, pick the one that brings the average closest to +desired average sub find_best { my ($data, $best, $ave, $tgt_ave, $cnt) = @_; my ($beg, $end, $mid) = (0, $#$data, undef); while ($beg <= $end) { $mid = $beg + ($end - $beg) / 2; my $val = $data->[$mid]; if ($val > $best) { $end = $mid - 1; } elsif ($val < $best) { $beg = $mid + 1; } else { return $mid; } } $mid = int $mid; my $minus_1 = $mid > 0 ? $mid - 1 : undef; my $plus_1 = $mid < $#$data ? $mid + 1 : undef; my ($min, $idx); for ($minus_1, $mid, $plus_1) { next if ! defined $_; my $val = $data->[$_]; my $new_ave = (($ave * $cnt) + $val) / ($cnt + 1); my $diff = abs($tgt_ave - $new_ave); if (! defined $min || $diff < $min) { ($min, $idx) = ($diff, $_); } } return $idx; } __DATA__ 5 @ 93.8 20 @ 93.81 10 @ 93.82 15 @ 93.83 25 @ 93.84 5 @ 93.85 20 @ 93.87 5 @ 94 35 @ 94.1 10 @ 94.2

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Average Price Algorithm
by GrandFather (Saint) on Jan 29, 2009 at 02:36 UTC

    If you add the line:

    ++$fragment{$frag}{$val};

    following ++$fragment{$frag}{items}; in the innermost loop in the main block you record the quantities at each price that were used to achieve each person's package.

    Oh, small nit (that may be covered by "I left it the way it is ...")- you didn't allocate anything to E because it wasn't included in %fragment.


    Perl's payment curve coincides with its learning curve.
Re^2: Average Price Algorithm
by camelcom (Sexton) on Jan 29, 2009 at 07:33 UTC
    Thanks Limbic, but I'm not sure this approach will find the best solution. I know that, for the given example, there is a selection which gives a maximum error in all buckets of 0.0001754 (3 bucket averages being exactly the average).

    Although this approach looks correct, often the best solution will be some blend of fragments which this methodology will not find. I'm thinking some kind of brute-force method may be even better, but still working on options.

      (posted here instead of one level up because it seems to expand on the thoughts of the parent)

      ++ to Limbic~Region for the algorithm (grandparent node). I like it.

      It does appear, however, that it is minimizing the error once for each individual fragment, rather than minimizing the error for some summation function (absolute error, statistical deviation, etc) over the entire population.

      As a "good" solution, I think that this is excellent, and it seems to be similar to a "good" solution to the bin packing problem that I remember from my undergrad days. It can be fooled with some nasty data.

      Like any other NP Hard or optimization problem, an approximation's "good enough" value is in the eye of the beholder, and if you are willing to calculate (or wait for) the optimum solution.

      You could add an optimization loop to the end of this algorithm, and bound it with whatever time you are willing to wait (or some other limiting factor), and start swapping values from the bins (picking which ones will give the best change is left as an exercise - perhaps swapping something from the best and the worst, or the two worst, or the worst over and the worst under ), recalculating the fitness, saving the state if better, and repeating if you still have cycles left.

      --MidLifeXis

      camelcom,
      This is a heuristic solution and the problem appears NP hard/complete. That means it is guaranteed not to produce the best solution for all data sets. That is why I said it did amazingly well but didn't say perfect. You indicated you had an acceptable margin of area error and it looked like this could be made to fit your criteria. It is also obviously fast and can be improved.

      Cheers - L~R

Re^2: Average Price Algorithm
by camelcom (Sexton) on Jan 29, 2009 at 10:13 UTC
    I have checked the Dominus book. The partitioning problem shown is solved recursively, but it's slightly different to this problem in that (with this problem) you only know the quality of the fit once the overall selection (for a bucket) is made. Often there is some blend of allocations which fit really well, but it's easy to miss those if you get too clever with the algorithm.

    As for the measurement of fitness - I'm having trouble deciding, but I think the sum of the errors is reasonable. Certainly not the max error because that will most often occur for allocations which are very small (1,2 or 3).

    With monks help, I think I'm starting to get a better handle on this problem - I have an idea for a new approach based on a combination of a heuristic and brute-force.

    I work in an organisation which is (supposedly) stuffed full of smart people, but when I get stuck I always come to the monks first :)

      sum of absolute errors, I think, is what you mean. Otherwise your fitness function should always return 0 ;-)

      You may be better off using one of the statistical error calculations (Portal:Statistics). IIRC (and stats was not my strong suit 15+ years ago, so apply salt as necessary), some of the more complex (well, more complex than summation anyway) error functions would give "better" results.

      --MidLifeXis