Re: Statistics::LineFit versus gnuplot, results differ

In order to more directly compare your perl and gnuplot result, I took the epoch time values an placed them in the test.dat file.

test.dat

# date     notkept hosts
1399003199 50      10
1399089599 63      11
1399175999 120     12
1399262399 55      20
1399348799 60      22
1399435199 63      25
1399521599 52      24
[download]

Then I performed the fit with this simpler gnuplot script.

!/usr/bin/gnuplot

set xlabel "Date"
set ylabel "Count"

set autoscale
p(x) = m1*x + b1
fit p(x) 'test.dat' using 1:2 via m1, b1

plot 'test.dat' using 1:2, p(x)
[download]

I get the following:

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

m1              = 4.65548e-08      +/- 5.823e-05    (1.251e+05%)
b1              = 1                +/- 8.148e+04    (8.148e+06%)
[download]

Adding initial guesses to the gnuplot script results in a slope value that is closer to the perl result:

#!/usr/bin/gnuplot

set xlabel "Date"
set ylabel "Count"

set autoscale

p(x) = m1*x + b1
m1 = -1e-5
b1 = 30000
fit p(x) 'test.dat' using 1:2 via m1, b1

plot 'test.dat' using 1:2, p(x)
[download]

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

m1              = -2.13926e-05     +/- 5.736e-05    (268.1%)
b1              = 30000            +/- 8.027e+04    (267.6%)
[download]

All curve fitting algorithms are not equal, and this suggests that the gnuplot algorithm is more sensitive to the initial guesses compared to the perl algorithm...or the perl algorithm may generate better initial guesses from the input data compared to gnuplot. This is just an educated guess, I have not looked at the code for either algorithm. If I normalize the x-axis values by dividing them by the first x value, thus giving x values that are closer in magnitude compared to the y values, I get a result that is closer to that of your perl code.

#!/usr/bin/gnuplot

set xlabel "Date"
set ylabel "Count"

set autoscale

p(x) = m1 * (x / 1399003199)  + b1
fit p(x) 'test.dat' using 1:2 via m1, b1

plot 'test.dat' using 1:2, p(x)
[download]

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

m1              = -31227.8         +/- 8.025e+04    (257%)
b1              = 31299.7          +/- 8.026e+04    (256.4%)
[download]

Note that you need to divide the m1 value by 1399003199 to compare it to your perl results, giving -2.23214643271162e-05. So, I suspect there is a weakness in the gnuplot algorithm that shows up when the magnitudes of the x and y values are very different.

Fitting the same data using the lm function in R gives the following output (without giving initial guesses, and without normalizing the x values):

> x
[1] 1399003199 1399089599 1399175999 1399262399 1399348799
[6] 1399435199 1399521599
> y
[1]  50  63 120  55  60  63  52
> lm(y ~ x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
  3.130e+04   -2.232e-05

> m <- lm(y ~ x)
> summary(m)

Call:
lm(formula = y ~ x)

Residuals:
       1        2        3        4        5        6        7 
-21.9286  -7.0000  51.9286 -11.1429  -4.2143   0.7143  -8.3571 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.130e+04  8.026e+04   0.390    0.713
x           -2.232e-05  5.736e-05  -0.389    0.713

Residual standard error: 26.22 on 5 degrees of freedom
Multiple R-squared:  0.0294,    Adjusted R-squared:  -0.1647 
F-statistic: 0.1514 on 1 and 5 DF,  p-value: 0.7132
[download]

which is similar to your perl result. For this dataset, the standard errors are pretty high compared to the fit values. They are about 2 to 3 times the parameter values, this means that you could change the fit slope and intercept by 2 to 3 times and get a similar quality of fit. Ideally, standard errors should be smaller than your fit parameter values.

Comment on Re: Statistics::LineFit versus gnuplot, results differ Select or Download Code