neilwatson has asked for the wisdom of the Perl Monks concerning the following question:
Greetings,
I've created a utility using Statistics::LineFit and another using Gnuplot and fed both the same sample data. The results differ, so I must have made a mistake, but I can't see where.
#!/usr/bin/perl use strict; use warnings; use Statistics::LineFit; use Time::Local; use Data::Dumper; my @x_axis; my @y_axes; sub date_to_epoch { my $date = shift; my ( $y, $m, $d ) = split /-/, $date; return timelocal( '59', '59', '23', $d, $m, $y ); #return timelocal( '0', '0', '0', $d, $m, $y ); } sub max_value { my @array = @_; my $max = $array[0]; for ( my $i = 0; $i <= $#array; $i++ ) { $max = $array[$i] if ( $array[$i] > $max ); } return $max; } my @epochs; while (<DATA>) { next if ( m/^#/ ); chomp; if ( my @line = split /\s+/ ) { my $epoch = date_to_epoch( $line[0] ); # factor down epoch or slope is too shallow. push @x_axis, $epoch; shift @line; for ( my $y = 0; $y <= $#line; $y++ ) { push @{$y_axes[$y]}, $line[$y] ; } } } print Dumper ( \@x_axis ); print Dumper ( \@y_axes ); my $lineFit = Statistics::LineFit->new( 0, 0 ); # TODO change 2nd to 1 $lineFit->setData( \@x_axis, \@{$y_axes[0]} ) or die "Invalid regressi +on data\n"; my ( $intercept, $slope ) = $lineFit->coefficients(); print "Slope(m): $slope Y-intercept(b): $intercept\n"; my %fitline; $fitline{y1} = $intercept; $fitline{x1} = 0; $fitline{y2} = max_value( @{$y_axes[0]} ); $fitline{x2} = ( $fitline{y2} - $fitline{y1} ) / $slope + $fitline{x1} +; print Dumper ( \%fitline ); __DATA__ # date notkept hosts 2014-04-01 50 10 2014-04-02 63 11 2014-04-03 120 12 2014-04-04 55 20 2014-04-05 60 22 2014-04-06 63 25 2014-04-07 52 24
#!/usr/bin/gnuplot #set output "test.png" set title "Promises not kept" set xlabel "Date" set ylabel "Count" set rmargin 7 set border linewidth 2 set style line 1 linecolor rgb 'blue' linetype 1 linewidth 2 set style line 2 linecolor rgb 'black' linetype 1 linewidth 2 set style fill solid set xdata time set timefmt "%Y-%m-%d" set format x "%Y-%m-%d" set grid front set grid set autoscale # 1e8 reduces the epoch seconds for a less flat line. h(x) = m2 * x + b2 fit h(x) 'test.dat' using 1:3 via m2,b2 p(x) = m1 * x + b1 fit p(x) 'test.dat' using 1:2 via m1,b1 #set terminal png enhanced size 1024,768 plot 'test.dat' using 1:2 title 'Promises not kept' with boxes lc rgb +"orange", \ p(x) title 'Promise Trend' with lines linestyle 1, \ h(x) title 'Host Trend' with lines linestyle 2
# date notkept hosts 2014-04-01 50 10 2014-04-02 63 11 2014-04-03 120 12 2014-04-04 55 20 2014-04-05 60 22 2014-04-06 63 25 2014-04-07 52 24
$VAR1 = [ 1399003199, 1399089599, 1399175999, 1399262399, 1399348799, 1399435199, 1399521599 ]; $VAR1 = [ [ '50', '63', '120', '55', '60', '63', '52' ], [ '10', '11', '12', '20', '22', '25', '24' ] ]; Slope(m): -2.23214285714286e-05 Y-intercept(b): 31299.6785491071 $VAR1 = { 'y1' => '31299.6785491071', 'x2' => 1396849599, 'y2' => 120, 'x1' => 0 };
Final set of parameters Asymptotic Standard Error ======================= ========================== m1 = 1.44796e-07 +/- 5.823e-05 (4.022e+04%) b1 = 1 +/- 2.62e+04 (2.62e+06%) correlation matrix of the fit parameters: m1 b1 m1 1.000 b1 -1.000 1.000
Note that m1 and b1 from gnuplot are not the same as Slope and Y-intercept from Perl. Why?
Neil Watson
watson-wilson.ca
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Statistics::LineFit versus gnuplot, results differ
by sn1987a (Curate) on Apr 25, 2014 at 19:04 UTC | |
|
Re: Statistics::LineFit versus gnuplot, results differ
by kevbot (Vicar) on Apr 26, 2014 at 05:45 UTC | |
|
Re: Statistics::LineFit versus gnuplot, results differ
by sn1987a (Curate) on Apr 27, 2014 at 02:11 UTC |