in reply to using Statistics::Regression

Statistics::Regression is ten years old and has a horrible interface and useless documentation. Maybe someone can suggest a better module.

I haven't tested it, but it looks like the way you would have to use it to get a cubic fit is:

my $reg = Statistics::Regression->new("Pain", ["C", "X", "X**2", "X**3 +", "Y"]); $reg->include(1.0, [1.0, $x, $x**2, $x**3, $y]);

Replies are listed 'Best First'.
Re^2: using Statistics::Regression
by Random_Walk (Prior) on Apr 19, 2017 at 15:31 UTC

    And there I was thinking of the pain I would save myself using a ready cut module ;)

    I must say your two lines of code does rather beat the existing docco. Of to give it a go, thanks.

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!
      Ah, I've got it. Try this instead:
      my $reg = Statistics::Regression->new("Pain", ["C", "X", "X**2", "X**3 +"]); $reg->include($y, [1.0, $x, $x**2, $x**3]);

        Thank you so much Anonymonk, now I am getting somewhere. The fragment of code I am now using, culled from a larger script goes like this ...

        # OK, now lets use linear regression to fit my $reg = Statistics::Regression->new( $data->{Name}, [ "Const", " +Theta1", "Theta2" ] ); # Add data points for ( @{$data->{values}} ) { # some time conversion goes on here to make times into Epo +ch ... my $epoch = mktime($s, $m, $h, $D, $M-1, $Y); my $x = $_->[2]; print "\$reg->include ( $epoch, [1, $x, ". $x**2 ." ] )\n" +; $reg->include ( $epoch, [1, $x, $x**2 ] ); } print "Results are ...\n"; $reg->print();

        Here is some output, now mostly it is working, but then on one set of data it chokes...

        # This is printed by the lines above, this one works fine... $reg->include ( 1491858157, [1, 95.24, 9070.6576 ] ) $reg->include ( 1491944593, [1, 95.24, 9070.6576 ] ) $reg->include ( 1492030986, [1, 95.22, 9066.8484 ] ) $reg->include ( 1492117236, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492203637, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492290038, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492376435, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492462840, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492549241, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492621259, [1, 95.24, 9070.6576 ] ) Results are ... **************************************************************** Regression '3116.dpepicqt.SYSAUX' **************************************************************** Name Theta StdErr T-stat [0='const'] -22405806112434.5120 16790271247707.7460 -1.3 +3 [1='Theta1'] 470587750270.0963 352619498612.0823 1.3 +3 [2='Theta2'] -2470766737.1275 1851377334.2664 -1.33 R^2= 0.206, N= 10, K= 3 **************************************************************** # This one chokes ... $reg->include ( 1491858157, [1, 93.6, 8760.96 ] ) $reg->include ( 1491944593, [1, 93.6, 8760.96 ] ) $reg->include ( 1492030986, [1, 93.6, 8760.96 ] ) $reg->include ( 1492117236, [1, 93.6, 8760.96 ] ) $reg->include ( 1492203637, [1, 93.6, 8760.96 ] ) $reg->include ( 1492290038, [1, 93.6, 8760.96 ] ) $reg->include ( 1492376435, [1, 93.6, 8760.96 ] ) $reg->include ( 1492462840, [1, 93.6, 8760.96 ] ) $reg->include ( 1492549241, [1, 93.6, 8760.96 ] ) $reg->include ( 1492621259, [1, 93.64, 8768.4496 ] ) Results are ... **************************************************************** Regression '3116.dpepicqt.SYSTEM' **************************************************************** Report.pl::Statistics::Regression:standarderrors: I cannot compute the + theta-covariance matrix for variable 3 0 at C:/Perl64/site/lib/Statistics/Regression.pm line 619. Statistics::Regression::standarderrors(Statistics::Regression= +HASH(0x44dfe90)) called at C:/Perl64/site/lib/Statistics/Regression.p +m line 430 Statistics::Regression::print(Statistics::Regression=HASH(0x44 +dfe90)) called at Report.pl line 125 main::predict(HASH(0x4340ec8), 10) called at Report.pl line 85

        I am guessing I may not have enough variation in that data for it to find an optimum, but if anyone can see I am barking up the wrong tree, please do shout

        Cheers,
        R.

        Pereant, qui ante nos nostra dixerunt!

        Update

        I have now tried it with a cubic term, and it failed on an earlier data set. Then I tried it with just the Constant and an X terms, no square or higher, and it ran the complete set. So now I can get a best fit line. Next step is to see if I can feed it some guess values for the theta vector.
Re^2: using Statistics::Regression
by Anonymous Monk on Apr 19, 2017 at 17:03 UTC
    Nope. I figured the first 1.0 was a weight, and you could just leave it like that if you didn't care about weighting. No go. The module crashes if the first number is the same for all data points. Hm.
      Argh! The weight is an optional third argument to include(), which isn't mentioned anywhere in the documentation.