comment on

In order to more directly compare your perl and gnuplot result, I took the epoch time values an placed them in the test.dat file.

test.dat

# date     notkept hosts
1399003199 50      10
1399089599 63      11
1399175999 120     12
1399262399 55      20
1399348799 60      22
1399435199 63      25
1399521599 52      24
[download]

Then I performed the fit with this simpler gnuplot script.

!/usr/bin/gnuplot

set xlabel "Date"
set ylabel "Count"

set autoscale
p(x) = m1*x + b1
fit p(x) 'test.dat' using 1:2 via m1, b1

plot 'test.dat' using 1:2, p(x)
[download]

I get the following:

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

m1              = 4.65548e-08      +/- 5.823e-05    (1.251e+05%)
b1              = 1                +/- 8.148e+04    (8.148e+06%)
[download]

Adding initial guesses to the gnuplot script results in a slope value that is closer to the perl result:

#!/usr/bin/gnuplot

set xlabel "Date"
set ylabel "Count"

set autoscale

p(x) = m1*x + b1
m1 = -1e-5
b1 = 30000
fit p(x) 'test.dat' using 1:2 via m1, b1

plot 'test.dat' using 1:2, p(x)
[download]

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

m1              = -2.13926e-05     +/- 5.736e-05    (268.1%)
b1              = 30000            +/- 8.027e+04    (267.6%)
[download]

All curve fitting algorithms are not equal, and this suggests that the gnuplot algorithm is more sensitive to the initial guesses compared to the perl algorithm...or the perl algorithm may generate better initial guesses from the input data compared to gnuplot. This is just an educated guess, I have not looked at the code for either algorithm. If I normalize the x-axis values by dividing them by the first x value, thus giving x values that are closer in magnitude compared to the y values, I get a result that is closer to that of your perl code.

#!/usr/bin/gnuplot

set xlabel "Date"
set ylabel "Count"

set autoscale

p(x) = m1 * (x / 1399003199)  + b1
fit p(x) 'test.dat' using 1:2 via m1, b1

plot 'test.dat' using 1:2, p(x)
[download]

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

m1              = -31227.8         +/- 8.025e+04    (257%)
b1              = 31299.7          +/- 8.026e+04    (256.4%)
[download]

Note that you need to divide the m1 value by 1399003199 to compare it to your perl results, giving -2.23214643271162e-05. So, I suspect there is a weakness in the gnuplot algorithm that shows up when the magnitudes of the x and y values are very different.

Fitting the same data using the lm function in R gives the following output (without giving initial guesses, and without normalizing the x values):

> x
[1] 1399003199 1399089599 1399175999 1399262399 1399348799
[6] 1399435199 1399521599
> y
[1]  50  63 120  55  60  63  52
> lm(y ~ x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
  3.130e+04   -2.232e-05

> m <- lm(y ~ x)
> summary(m)

Call:
lm(formula = y ~ x)

Residuals:
       1        2        3        4        5        6        7 
-21.9286  -7.0000  51.9286 -11.1429  -4.2143   0.7143  -8.3571 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.130e+04  8.026e+04   0.390    0.713
x           -2.232e-05  5.736e-05  -0.389    0.713

Residual standard error: 26.22 on 5 degrees of freedom
Multiple R-squared:  0.0294,    Adjusted R-squared:  -0.1647 
F-statistic: 0.1514 on 1 and 5 DF,  p-value: 0.7132
[download]

which is similar to your perl result. For this dataset, the standard errors are pretty high compared to the fit values. They are about 2 to 3 times the parameter values, this means that you could change the fit slope and intercept by 2 to 3 times and get a similar quality of fit. Ideally, standard errors should be smaller than your fit parameter values.

In reply to Re: Statistics::LineFit versus gnuplot, results differ by kevbot
in thread Statistics::LineFit versus gnuplot, results differ by neilwatson

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.