In order to more directly compare your perl and gnuplot result, I took the epoch time values an placed them in the test.dat file.

test.dat

# date notkept hosts 1399003199 50 10 1399089599 63 11 1399175999 120 12 1399262399 55 20 1399348799 60 22 1399435199 63 25 1399521599 52 24
Then I performed the fit with this simpler gnuplot script.
!/usr/bin/gnuplot set xlabel "Date" set ylabel "Count" set autoscale p(x) = m1*x + b1 fit p(x) 'test.dat' using 1:2 via m1, b1 plot 'test.dat' using 1:2, p(x)
I get the following:
Final set of parameters Asymptotic Standard Error ======================= ========================== m1 = 4.65548e-08 +/- 5.823e-05 (1.251e+05%) b1 = 1 +/- 8.148e+04 (8.148e+06%)
Adding initial guesses to the gnuplot script results in a slope value that is closer to the perl result:
#!/usr/bin/gnuplot set xlabel "Date" set ylabel "Count" set autoscale p(x) = m1*x + b1 m1 = -1e-5 b1 = 30000 fit p(x) 'test.dat' using 1:2 via m1, b1 plot 'test.dat' using 1:2, p(x)
Final set of parameters Asymptotic Standard Error ======================= ========================== m1 = -2.13926e-05 +/- 5.736e-05 (268.1%) b1 = 30000 +/- 8.027e+04 (267.6%)
All curve fitting algorithms are not equal, and this suggests that the gnuplot algorithm is more sensitive to the initial guesses compared to the perl algorithm...or the perl algorithm may generate better initial guesses from the input data compared to gnuplot. This is just an educated guess, I have not looked at the code for either algorithm. If I normalize the x-axis values by dividing them by the first x value, thus giving x values that are closer in magnitude compared to the y values, I get a result that is closer to that of your perl code.
#!/usr/bin/gnuplot set xlabel "Date" set ylabel "Count" set autoscale p(x) = m1 * (x / 1399003199) + b1 fit p(x) 'test.dat' using 1:2 via m1, b1 plot 'test.dat' using 1:2, p(x)
Final set of parameters Asymptotic Standard Error ======================= ========================== m1 = -31227.8 +/- 8.025e+04 (257%) b1 = 31299.7 +/- 8.026e+04 (256.4%)

Note that you need to divide the m1 value by 1399003199 to compare it to your perl results, giving -2.23214643271162e-05. So, I suspect there is a weakness in the gnuplot algorithm that shows up when the magnitudes of the x and y values are very different.

Fitting the same data using the lm function in R gives the following output (without giving initial guesses, and without normalizing the x values):

> x [1] 1399003199 1399089599 1399175999 1399262399 1399348799 [6] 1399435199 1399521599 > y [1] 50 63 120 55 60 63 52 > lm(y ~ x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x 3.130e+04 -2.232e-05 > m <- lm(y ~ x) > summary(m) Call: lm(formula = y ~ x) Residuals: 1 2 3 4 5 6 7 -21.9286 -7.0000 51.9286 -11.1429 -4.2143 0.7143 -8.3571 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.130e+04 8.026e+04 0.390 0.713 x -2.232e-05 5.736e-05 -0.389 0.713 Residual standard error: 26.22 on 5 degrees of freedom Multiple R-squared: 0.0294, Adjusted R-squared: -0.1647 F-statistic: 0.1514 on 1 and 5 DF, p-value: 0.7132
which is similar to your perl result. For this dataset, the standard errors are pretty high compared to the fit values. They are about 2 to 3 times the parameter values, this means that you could change the fit slope and intercept by 2 to 3 times and get a similar quality of fit. Ideally, standard errors should be smaller than your fit parameter values.

In reply to Re: Statistics::LineFit versus gnuplot, results differ by kevbot
in thread Statistics::LineFit versus gnuplot, results differ by neilwatson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.