Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^4: using Statistics::Regression

by Random_Walk (Prior)
on Apr 19, 2017 at 19:52 UTC ( [id://1188310]=note: print w/replies, xml ) Need Help??


in reply to Re^3: using Statistics::Regression
in thread using Statistics::Regression

Thank you so much Anonymonk, now I am getting somewhere. The fragment of code I am now using, culled from a larger script goes like this ...

# OK, now lets use linear regression to fit my $reg = Statistics::Regression->new( $data->{Name}, [ "Const", " +Theta1", "Theta2" ] ); # Add data points for ( @{$data->{values}} ) { # some time conversion goes on here to make times into Epo +ch ... my $epoch = mktime($s, $m, $h, $D, $M-1, $Y); my $x = $_->[2]; print "\$reg->include ( $epoch, [1, $x, ". $x**2 ." ] )\n" +; $reg->include ( $epoch, [1, $x, $x**2 ] ); } print "Results are ...\n"; $reg->print();

Here is some output, now mostly it is working, but then on one set of data it chokes...

# This is printed by the lines above, this one works fine... $reg->include ( 1491858157, [1, 95.24, 9070.6576 ] ) $reg->include ( 1491944593, [1, 95.24, 9070.6576 ] ) $reg->include ( 1492030986, [1, 95.22, 9066.8484 ] ) $reg->include ( 1492117236, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492203637, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492290038, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492376435, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492462840, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492549241, [1, 95.23, 9068.7529 ] ) $reg->include ( 1492621259, [1, 95.24, 9070.6576 ] ) Results are ... **************************************************************** Regression '3116.dpepicqt.SYSAUX' **************************************************************** Name Theta StdErr T-stat [0='const'] -22405806112434.5120 16790271247707.7460 -1.3 +3 [1='Theta1'] 470587750270.0963 352619498612.0823 1.3 +3 [2='Theta2'] -2470766737.1275 1851377334.2664 -1.33 R^2= 0.206, N= 10, K= 3 **************************************************************** # This one chokes ... $reg->include ( 1491858157, [1, 93.6, 8760.96 ] ) $reg->include ( 1491944593, [1, 93.6, 8760.96 ] ) $reg->include ( 1492030986, [1, 93.6, 8760.96 ] ) $reg->include ( 1492117236, [1, 93.6, 8760.96 ] ) $reg->include ( 1492203637, [1, 93.6, 8760.96 ] ) $reg->include ( 1492290038, [1, 93.6, 8760.96 ] ) $reg->include ( 1492376435, [1, 93.6, 8760.96 ] ) $reg->include ( 1492462840, [1, 93.6, 8760.96 ] ) $reg->include ( 1492549241, [1, 93.6, 8760.96 ] ) $reg->include ( 1492621259, [1, 93.64, 8768.4496 ] ) Results are ... **************************************************************** Regression '3116.dpepicqt.SYSTEM' **************************************************************** Report.pl::Statistics::Regression:standarderrors: I cannot compute the + theta-covariance matrix for variable 3 0 at C:/Perl64/site/lib/Statistics/Regression.pm line 619. Statistics::Regression::standarderrors(Statistics::Regression= +HASH(0x44dfe90)) called at C:/Perl64/site/lib/Statistics/Regression.p +m line 430 Statistics::Regression::print(Statistics::Regression=HASH(0x44 +dfe90)) called at Report.pl line 125 main::predict(HASH(0x4340ec8), 10) called at Report.pl line 85

I am guessing I may not have enough variation in that data for it to find an optimum, but if anyone can see I am barking up the wrong tree, please do shout

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!

Update

I have now tried it with a cubic term, and it failed on an earlier data set. Then I tried it with just the Constant and an X terms, no square or higher, and it ran the complete set. So now I can get a best fit line. Next step is to see if I can feed it some guess values for the theta vector.

Replies are listed 'Best First'.
Re^5: using Statistics::Regression
by Anonymous Monk on Apr 19, 2017 at 20:21 UTC
    You need at least 3 distinct values of x to produce a quadratic fit. Similarly, you need at least 4 for a cubic, and at least 2 for a linear fit.

      Yeh, I remembered that when I bombed out on a case with two samples so I have added a guard clause that I do not try to use LR if I have less than 10 samples. It appears to still have some very degenerate cases that kill this module with plenty of samples. I have now updated my code to divide Max(Y) by Min(Y) and use this 'change' value to alter the number of terms I use. Strangely it looks like cases where there is a sudden step change in the data that trigger this failure, I have one case where there is a 56% change and that kills the module.

      # Lets play with order based on Max/Min value my $change = $data->{Max}{AVG_Percentage_Used} / $data->{Min}{AVG_ +Percentage_Used}; print "Change is $change\n"; my $order = 1; $order = 2 if $change > 1.05; $order = 3 if $change > 1.15; $order = 4 if $change > 1.30; my @Thetas = 'Const'; # Set Thetas for zero order push @Thetas, 'Theta'.$_ for 1 .. $order; my $reg = Statistics::Regression->new( $data->{Name}, \@Thetas ); # Add data points for ( @{$data->{values}} ) { my $epoch = mktime($s, $m, $h, $D, $M-1, $Y); my $x = $_->[2]; my @Data = 1; push @Data, $x**$_ for 1..$order; print "\$reg->include ( $epoch, [".(join ", ", @Data)." ] )\n" +; $reg->include ( $epoch, \@Data ); } print "Results are ...\n"; $reg->print();

      That appear to guard the cases where I had very little movement in Y over the series, but this one still kills it. The value 'Change' in this debug is Max/Min, so here there is a 30% change

      Change is 1.3 $reg->include ( 1491859118, [1, 3.25, 10.5625 ] ) $reg->include ( 1491902520, [1, 3.25, 10.5625 ] ) $reg->include ( 1492032609, [1, 2.5, 6.25 ] ) $reg->include ( 1492117432, [1, 2.5, 6.25 ] ) $reg->include ( 1492204208, [1, 2.5, 6.25 ] ) $reg->include ( 1492291088, [1, 2.5, 6.25 ] ) $reg->include ( 1492377875, [1, 2.5, 6.25 ] ) $reg->include ( 1492464416, [1, 2.5, 6.25 ] ) $reg->include ( 1492551241, [1, 2.5, 6.25 ] ) $reg->include ( 1492623578, [1, 2.5, 6.25 ] ) Results are ... **************************************************************** Regression 'gotsvl2143.dpcmsr1t.TOOLS' **************************************************************** Report.pl::Statistics::Regression:standarderrors: I cannot compute the + theta-covariance matrix for variable 3 0

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!
        It's not the amount of change that is the problem, it's the number of distinct values. In your last example, there are only 2 different values of x (2.5 and 3.25), so you can't get more than a linear fit. This is a fundamental limitation of the underlying mathematics.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1188310]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-03-28 16:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found