Re: Average Price Algorithm

camelcom,
The following heuristic approach does amazingly well with the given data set. It can be improved further by not calculating the last fragment since whatever is left must go in it. I left it the way it is in case it was possible that the total of all fragments was less than the total quantity.

The algorithm works as follows:

Fill the fragments with the fewest items to the most
Choose the item that will bring the average closest to the target average

The binary search could be improved and some of the math is duplicated so there are speed improvements to be had. There may also be bugs to be squashed as I wrote it in a hurry.

#!/usr/bin/perl
use strict;
use warnings;
use List::Util 'sum';

my @data;
while (<DATA>) {
    chomp;
    my ($quantity, $cost) = split /\s*@\s*/;
    push @data, ($cost) x $quantity;
}

@data = sort {$a <=> $b} @data;


my %fragment = (
    A => {count => 65, ave => 0, items => 0},
    B => {count => 12, ave => 0, items => 0},
    C => {count => 24, ave => 0, items => 0},
    D => {count => 19, ave => 0, items => 0},
);

my $tgt_ave = sum(@data) / @data;
for my $frag (sort {$fragment{$a}{count} <=> $fragment{$b}{count}} key
+s %fragment) {
    for (1 .. $fragment{$frag}{count}) {
        my ($cnt, $ave) = @{$fragment{$frag}}{qw/items ave/};
        my $best = ($tgt_ave * $cnt) + $tgt_ave - ($ave * $cnt);
        my $idx = find_best(\@data, $best, $ave, $tgt_ave, $cnt);
        my $val = splice(@data, $idx, 1);
        #push @{$fragment{$frag}{val}}, $val;
        ++$fragment{$frag}{items};
        $fragment{$frag}{ave} = (($ave * $cnt) + $val) / ($cnt + 1);
    }
}

use Data::Dumper;
print Dumper(\%fragment);

# if not exact match, pick the one that brings the average closest to 
+desired average
sub find_best {
    my ($data, $best, $ave, $tgt_ave, $cnt) = @_;

    my ($beg, $end, $mid) = (0, $#$data, undef);
    while ($beg <= $end) {
        $mid = $beg + ($end - $beg) / 2;
        my $val = $data->[$mid];
        if ($val > $best) {
            $end = $mid - 1;
        }
        elsif ($val < $best) {
            $beg = $mid + 1;
        }
        else {
            return $mid;
        }
    }
    $mid = int $mid;
    my $minus_1 = $mid > 0 ? $mid - 1 : undef;
    my $plus_1 = $mid < $#$data ? $mid + 1 : undef;
    my ($min, $idx);
    for ($minus_1, $mid, $plus_1) {
        next if ! defined $_;
        my $val = $data->[$_];
        my $new_ave = (($ave * $cnt) + $val) / ($cnt + 1);
        my $diff = abs($tgt_ave - $new_ave);
        if (! defined $min || $diff < $min) {
            ($min, $idx) = ($diff, $_);
        }
    }
    return $idx;
}
__DATA__
5 @ 93.8
20 @ 93.81
10 @ 93.82
15 @ 93.83
25 @ 93.84
5 @ 93.85
20 @ 93.87
5 @ 94
35 @ 94.1
10 @ 94.2
[download]

Cheers - L~R

Comment on Re: Average Price Algorithm Download Code

Replies are listed 'Best First'.
Re^2: Average Price Algorithm by GrandFather (Saint) on Jan 29, 2009 at 02:36 UTC
If you add the line: `++$fragment{$frag}{$val};` [download] following `++$fragment{$frag}{items};` in the innermost loop in the main block you record the quantities at each price that were used to achieve each person's package. Oh, small nit (that may be covered by "I left it the way it is ...")- you didn't allocate anything to E because it wasn't included in %fragment. Perl's payment curve coincides with its learning curve.	[reply] [d/l] [select]
Re^2: Average Price Algorithm by camelcom (Sexton) on Jan 29, 2009 at 07:33 UTC
Thanks Limbic, but I'm not sure this approach will find the best solution. I know that, for the given example, there is a selection which gives a maximum error in all buckets of 0.0001754 (3 bucket averages being exactly the average). Although this approach looks correct, often the best solution will be some blend of fragments which this methodology will not find. I'm thinking some kind of brute-force method may be even better, but still working on options.	[reply]
Re^3: Average Price Algorithm by MidLifeXis (Monsignor) on Jan 29, 2009 at 11:55 UTC
(posted here instead of one level up because it seems to expand on the thoughts of the parent) ++ to Limbic~Region for the algorithm (grandparent node). I like it. It does appear, however, that it is minimizing the error once for each individual fragment, rather than minimizing the error for some summation function (absolute error, statistical deviation, etc) over the entire population. As a "good" solution, I think that this is excellent, and it seems to be similar to a "good" solution to the bin packing problem that I remember from my undergrad days. It can be fooled with some nasty data. Like any other NP Hard or optimization problem, an approximation's "good enough" value is in the eye of the beholder, and if you are willing to calculate (or wait for) the optimum solution. You could add an optimization loop to the end of this algorithm, and bound it with whatever time you are willing to wait (or some other limiting factor), and start swapping values from the bins (picking which ones will give the best change is left as an exercise - perhaps swapping something from the best and the worst, or the two worst, or the worst over and the worst under ), recalculating the fitness, saving the state if better, and repeating if you still have cycles left. --MidLifeXis	[reply]
Re^3: Average Price Algorithm by Limbic~Region (Chancellor) on Jan 29, 2009 at 14:57 UTC
camelcom, This is a heuristic solution and the problem appears NP hard/complete. That means it is guaranteed not to produce the best solution for all data sets. That is why I said it did amazingly well but didn't say perfect. You indicated you had an acceptable margin of ~~area~~ error and it looked like this could be made to fit your criteria. It is also obviously fast and can be improved. Cheers - L~R	[reply]
Re^2: Average Price Algorithm by camelcom (Sexton) on Jan 29, 2009 at 10:13 UTC
I have checked the Dominus book. The partitioning problem shown is solved recursively, but it's slightly different to this problem in that (with this problem) you only know the quality of the fit once the overall selection (for a bucket) is made. Often there is some blend of allocations which fit really well, but it's easy to miss those if you get too clever with the algorithm. As for the measurement of fitness - I'm having trouble deciding, but I think the sum of the errors is reasonable. Certainly not the max error because that will most often occur for allocations which are very small (1,2 or 3). With monks help, I think I'm starting to get a better handle on this problem - I have an idea for a new approach based on a combination of a heuristic and brute-force. I work in an organisation which is (supposedly) stuffed full of smart people, but when I get stuck I always come to the monks first :)	[reply]
Re^3: Average Price Algorithm by MidLifeXis (Monsignor) on Jan 29, 2009 at 12:12 UTC
sum of absolute errors, I think, is what you mean. Otherwise your fitness function should always return 0 ;-) You may be better off using one of the statistical error calculations (Portal:Statistics). IIRC (and stats was not my strong suit 15+ years ago, so apply salt as necessary), some of the more complex (well, more complex than summation anyway) error functions would give "better" results. --MidLifeXis	[reply]