in reply to Statistics::LineFit versus gnuplot, results differ
In order to more directly compare your perl and gnuplot result, I took the epoch time values an placed them in the test.dat file.
test.dat
Then I performed the fit with this simpler gnuplot script.# date notkept hosts 1399003199 50 10 1399089599 63 11 1399175999 120 12 1399262399 55 20 1399348799 60 22 1399435199 63 25 1399521599 52 24
I get the following:!/usr/bin/gnuplot set xlabel "Date" set ylabel "Count" set autoscale p(x) = m1*x + b1 fit p(x) 'test.dat' using 1:2 via m1, b1 plot 'test.dat' using 1:2, p(x)
Adding initial guesses to the gnuplot script results in a slope value that is closer to the perl result:Final set of parameters Asymptotic Standard Error ======================= ========================== m1 = 4.65548e-08 +/- 5.823e-05 (1.251e+05%) b1 = 1 +/- 8.148e+04 (8.148e+06%)
#!/usr/bin/gnuplot set xlabel "Date" set ylabel "Count" set autoscale p(x) = m1*x + b1 m1 = -1e-5 b1 = 30000 fit p(x) 'test.dat' using 1:2 via m1, b1 plot 'test.dat' using 1:2, p(x)
All curve fitting algorithms are not equal, and this suggests that the gnuplot algorithm is more sensitive to the initial guesses compared to the perl algorithm...or the perl algorithm may generate better initial guesses from the input data compared to gnuplot. This is just an educated guess, I have not looked at the code for either algorithm. If I normalize the x-axis values by dividing them by the first x value, thus giving x values that are closer in magnitude compared to the y values, I get a result that is closer to that of your perl code.Final set of parameters Asymptotic Standard Error ======================= ========================== m1 = -2.13926e-05 +/- 5.736e-05 (268.1%) b1 = 30000 +/- 8.027e+04 (267.6%)
#!/usr/bin/gnuplot set xlabel "Date" set ylabel "Count" set autoscale p(x) = m1 * (x / 1399003199) + b1 fit p(x) 'test.dat' using 1:2 via m1, b1 plot 'test.dat' using 1:2, p(x)
Final set of parameters Asymptotic Standard Error ======================= ========================== m1 = -31227.8 +/- 8.025e+04 (257%) b1 = 31299.7 +/- 8.026e+04 (256.4%)
Note that you need to divide the m1 value by 1399003199 to compare it to your perl results, giving -2.23214643271162e-05. So, I suspect there is a weakness in the gnuplot algorithm that shows up when the magnitudes of the x and y values are very different.
Fitting the same data using the lm function in R gives the following output (without giving initial guesses, and without normalizing the x values):
which is similar to your perl result. For this dataset, the standard errors are pretty high compared to the fit values. They are about 2 to 3 times the parameter values, this means that you could change the fit slope and intercept by 2 to 3 times and get a similar quality of fit. Ideally, standard errors should be smaller than your fit parameter values.> x [1] 1399003199 1399089599 1399175999 1399262399 1399348799 [6] 1399435199 1399521599 > y [1] 50 63 120 55 60 63 52 > lm(y ~ x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x 3.130e+04 -2.232e-05 > m <- lm(y ~ x) > summary(m) Call: lm(formula = y ~ x) Residuals: 1 2 3 4 5 6 7 -21.9286 -7.0000 51.9286 -11.1429 -4.2143 0.7143 -8.3571 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.130e+04 8.026e+04 0.390 0.713 x -2.232e-05 5.736e-05 -0.389 0.713 Residual standard error: 26.22 on 5 degrees of freedom Multiple R-squared: 0.0294, Adjusted R-squared: -0.1647 F-statistic: 0.1514 on 1 and 5 DF, p-value: 0.7132
|
|---|