For starters, I think you switched a and b in the equations you give. The correct equations are: given here and here,. These equations are taken from an article on mathworld.wolfram.com on Least Squares Fitting. Notice these equations are implicit definitions of a and b. In other words both equations have both variables, so you can't simply plug in values to these equations and get a and b.

Simplifying these equations is not difficult but not trivial. The details of simplification is covered in the article. The resulting equations are the equations you need to use to solve for the coefficients of the linear least squares fit, a and b. The formulas you use try to solve for a using b, and solve for b using a. You can't do that, in perl not even using eval. This, however; is all math, and a bit off topic for Perlmonks. So let's talk Perl.

For this task a closure is a bit over kill as stated above. Even arrays and hash tables are excessive unless you are trying to calculate fits on several sets of data at once. In the simple case plain old variables work just fine.

The following example uses the first two columns of the input as x and y and calculates a and b of the least squares fit.

#! /usr/bin/perl -w use strict; my $sum_x = 0; my $sum_y = 0; my $sum_x2 = 0; my $sum_xy = 0; my $n = 0; while( <> ) { my ( $x, $y ) = split; $sum_x += $x; $sum_y += $y; $sum_x2 += $x * $x; $sum_xy += $x * $y; $n++; } my $a = ( $sum_y * $sum_x2 - $sum_x * $sum_xy ) / ( $n * $sum_x2 - $sum_x * $sum_x ); my $b = ( $n * $sum_xy - $sum_x * $sum_y ) / ( $n * $sum_x2 - $sum_x * $sum_x ); print "a = $a; b = $b\n";

This code prints out

	a = 0.999999714285714; b = 2.00000057142857
when given the following input

-2 -3
-1 -1
 0  0.99999
 1  3.00001
 2  5
 3  7
which is veritably correct.

This would be reinventing the wheel as pointed out by merlyn above for a small data set like the example. However, this implementation does not require storing the entire data set in memory. So it might be useful if the data set is large. There are other problems if the data set gets to large. The sum of the x squared's or the sum of the x*y's may get too large if the data set is really big. So, you may get an overflow, or some loss of precision in the calculation with large data sets. So don't blindly trust the outputs if the data set is large. Determining how valid the fit is is off topic for Perlmonks (and a bit over my head), but there are some resources listed in the article above, and around the internet.

update: I cleaned up some spelling, and completely changed my position here.


In reply to Re(2): Closures and Statistics by meta4
in thread Closures and Statistics by dimmesdale

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.