tomazos has asked for the wisdom of the Perl Monks concerning the following question:

Please consider the following function called inaccuracy. It looks at a long array called @sales which contains a series of integers:

sub inaccuracy { my ($base, $multiple, $log) = @_; my $inaccuracy = 0; my $interval = 0; foreach my $sale (@sales) { $interval++; $inaccuracy += ( $sale - $base - $multiple * log ($interval) / log ($log) ) ** 2; } return $inaccuracy; }

The challenge is to write a function called estimate that analyses the data in @sales and comes up with values for $base, $multiple and $log such that when they are passed to inaccuracy they minimize the value returned by inaccuracy.

Test data is provided below if anyone is interested in giving this a shot...

@sales = (12920, 12640, 12540, 12680, 12775, 12525, 12570, 12670, 12680, 12775, 12885, 12770, 12585, 12605, 12560, 12635, 12590, 12560, 12535, 12465, 12465, 12455, 12570, 12365, 12505, 12555, 12530, 12280, 12320, 11930, 12070, 12155, 11735, 11625, 11765, 11695, 11740, 11820, 11965, 11925, 12000, 11950, 11995, 11695, 11515, 11705, 11795, 11915, 11750, 11725, 11585, 11905, 12070, 11810, 11735, 11555, 11415, 11260, 11050, 11025, 10820, 10790, 10900, 10980, 11000, 10900, 10625, 10655, 10670, 10690, 10670, 10675, 10535, 10395, 10475, 10380, 10330, 10375, 10325, 10190, 10210, 10375, 10310, 10260, 10375, 10255, 10520, 10260, 10185, 10060, 9870, 10030, 10030, 9975, 9770, 9715, 9400, 9215, 9075, 9125, 9150, 9005, 8445, 8160, 8020, 8140, 8140, 8125, 7940, 7950, 7920, 7895, 7850, 7715, 7635, 7545, 7550, 7620, 7560, 7510, 7485, 7460, 7465, 7425, 7310, 7250, 7160, 7135, 7235, 7260, 7640, 7505, 7410, 7425, 7545, 7405, 7505, 7420, 7765, 7695, 7740, 7975, 7675, 7655, 7835, 8120, 7735, 7680, 7775, 7960, 8025, 8125, 8330, 8700, 8765, 8855, 8985, 9120, 9095, 9110, 9250, 9505, 9595, 9640, 9640, 9710, 9855, 9950, 10020, 10105, 10160, 10115, 10205, 10275, 10130, 10220, 10220, 10310, 10335, 10430, 10270, 9930, 9910, 9795, 10040, 10060, 10240, 10215, 9965, 9480, 9400, 9555, 9455, 9595, 9615, 9090, 9140, 8910, 9000, 8950, 8740, 8670, 8525, 8575, 8225, 8225, 8090, 7920, 7755, 7775, 8040, 7740, 7650, 7605, 7745, 7675, 7650, 7690, 7430, 7410, 7320, 7310, 7380, 7310, 7235, 7305, 7395, 7390, 7255, 7350, 7205, 7260, 7210, 7460, 7240, 7125, 6825, 6895, 6825, 7025, 7120, 7170, 6965, 7020, 7140, 7215, 7125, 6990, 6965, 6970, 7020, 7470, 7470, 7405, 7345, 7370, 7440, 7415, 7575, 7570, 7635, 7475, 7560, 7725, 7610, 7545, 7640, 7620, 7505, 7485, 7505, 7435, 7635, 7590, 7545, 7660, 7545, 7435, 7505, 7475, 7400, 7380, 7420, 7450, 7125, 7210, 7520, 7565, 7565, 7680, 7930, 7940, 7985, 7960, 7885, 7950, 8150, 8270, 8540, 8590, 8540, 8490, 8095, 8450, 8265, 8425, 8575, 8480, 8325, 8300, 8235, 8190, 8045, 8260, 8050, 8140, 8090, 8020, 8085, 7970, 8015, 7900, 7855, 7695, 7770, 7795, 7725, 7825, 7825, 7710, 7715, 7760, 7690, 7665, 7590, 7595, 7375, 7250, 7155, 7065, 6830, 6300, 6115, 5865, 5815, 5605, 5465, 5235, 5155, 4790, 4670, 4765, 4810, 4730, 4380, 4305, 4060, 3895, 3945, 4105, 4330, 4305, 4290, 4485, 4100, 4055, 4035, 4035, 4100, 4035, 4350, 4415, 4530, 4555, 4525, 4500, 4520, 4450, 4305, 4280, 4330, 4445, 4400, 4445, 4450, 4445, 4495, 4450, 4515, 4540, 4660, 4805, 4805, 4895, 4920, 4990, 5060, 5015, 5080, 5265);

Replies are listed 'Best First'.
Re: Numerical Analysis Challenge
by Masem (Monsignor) on Jan 03, 2002 at 00:41 UTC
    Basically, you're trying to find the regresssion for the function:
    sales = base_sale + K * log( i )
    
    where K = multiple/log( logbase ). There's no way by regression alone to determine multiple or logbase separately.

    The above regression is not strictly linear, but if you tranform i to x via

     x = log i 
    (or
    i = e^x
    ), then you get:
    sales = base_sale + K * x
    
    which is a straight linear regression once you apply the transform correctly to your interval variable. Thus, you simply have to do some stat summation over your data set, and you're all set; the exact equations for that should be in any numerical math text (I don't see any Perl modules that do regression easily, but the routine isn't hard for this).

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

      Oh course, Masem is largely right; however it looks to me like there is a module Statistics::OLS that will do linear least squares, which is exactly what you would want. I have never used it though, so YMMV for sure.

      Scott

      Dooh. You are right: $multiple and $log combine to give a single constant.

      Transforming on x = log i turns into into a straight linear regression and from some java code I found laying around the net I've got the alrgorithym for getting the line of best fit.

      Theoretically I should be all set. Thanks for your help.