swl has asked for the wisdom of the Perl Monks concerning the following question:

Does anyone know if there are perl bindings to MINPACK or some other curve fitting process?

So far I've been able to track down PDL::Fit::LM, PDL::Fit::Levmar and Starlink::AST. However, all have issues for my use case where I want to fit non-linear curves to a large number (thousands) of data sets, and for which I need to try a variety of functional shapes to identify which one fits best (four parameter logistic, power curve, etc).

PDL::Fit::Levmar does not index correctly on metacpan, so cpanm does not find it (nor does cpan). syphilis has windows PPMs available. However, the distro itself has several open issues for recent versions of PDL and it is unclear if it is still maintained. Until that's all sorted out I'm reluctant to use it in anything that will eventually need automated installation via cpanm.

lmfit from PDL::Fit::LM, but it is sensitive to the starting conditions. Specification of the formula is also not simple, as one needs to provide a subroutine for the derivatives (although one can use Math::Symbolic to generate one).

Starlink::AST includes cminpack, but appears to use a limited subset for standard polynomials. It also seems not to call it from pure perl code.

There is also Algorithm::CurveFit, but from memory is also sensitive to the starting conditions.

Code using PDL::Fit::Levmar and PDL::Fit::LM with a sample of my data is below. Running the same data through minpack via the nlsLM function in the minpack.lm package for R gives good results without being sensitive to the starting conditions (see second code block and results). minpack.lm is just a set of bindings around the minpack library, so I would expect the same results are possible for perl if bindings were to exist.

# perl code #use 5.024; use PDL; use PDL::NiceSlice; #use PDL::GSL::INTERP; use PDL::Fit::LM; use PDL::Stats; my @data = <DATA>; my $header = shift @data; my (@xarr, @yarr); foreach my $line (@data) { chomp $line; my @line_arr = split /\s+/, $line; push @xarr, $line_arr[0]; push @yarr, $line_arr[1]; } my $len = @xarr; my @target_iters = (0 .. $#xarr); my $x = pdl(@xarr[@target_iters]); my $y = pdl(@yarr[@target_iters]); my @init = (-1, 1, 2134, 2.27395324135227); # try levmar { my ($a, $b, $c, $d); use PDL::Fit::Levmar; my $func = sub { my ($p, $yy, $x) = @_; my ($a,$b, $c, $d) = $p->list; $yy = $d + ($a - $d) / (1 + ($x / $c)**$b); }; my $yy = pdl (@yarr[@target_iters]); my $params = pdl (@init); my $result_hash = levmar( $params, $yy, $x, FUNC => $func, ); say "------LEVMAR REPORT:----"; print levmar_report($result_hash); say "-----END LEVMAR REPORT----\n\n"; } # now try lmfit my ($ym, $finalp, $covar, $iters); say '----PDL::Fit::LM----'; while (1) { my $initp = pdl \@init; say "lmfit trying initp: $initp"; my $sigma = 1; ($ym, $finalp, $covar, $iters) = lmfit $x, $y, $sigma, \&symm_sig_func, $initp, {Maxiter => 300, Eps => 1e-10}; #say $finalp; #say $iters; #say $covar; #say $ym->sum; #my ($a, $b, $c, $d) = @{$finalp->unpdl}; #my $ppred = $d + ($a - $d) / (1 + ($x / $c)**$b); #say $ym; last if ($ym->sum ne 'NaN'); # try some smaller starting conditions $init[2] /= 2; $init[3] /= 2; } say "lmfit finalp: $finalp"; say "iters: $iters"; #say $ym->(0:10); #say 'RSQ: ' . (1 - ($y - $ym)->stdv / $y->stdv); my $meany = $y->average; say 'RSQ: ' . (1 - (($y - $ym)**2)->sum / (($y - $meany)**2)->sum); my ($a, $b, $c, $d) = @{$finalp->unpdl}; my $predx = pdl (@xarr); my $obsy = pdl (@yarr); my $pred = $d + ($a - $d) / (1 + ($predx / $c)**$b); my $pp = $pred->unpdl; #say join ' ', @$pp[0..10]; $meany = $obsy->average; my $fit_rsq = (1 - (($obsy - $pred)**2)->sum / (($obsy - $meany)**2)- +>sum); #my $aa = (1 - (($obsy - $pred)**2)->sum / (($obsy->stdv)*$obsy->nelem +)**2)->flat; say "lmfit RSQ: $fit_rsq"; say 'lmfit RMSE: ' . (($obsy - $pred)**2)->average; sub symm_sig_func { # leave this line as is my ($x, $par, $ym, $dyda) = @_; my ($a, $b, $c, $d) = @{$par->(0:3)->unpdl}; #$c = List::Util::max (0, $c); # avoid log of neg vals # Write function with dependent variable $ym, # independent variable $x, and fit parameters as specified above. # Use the .= (dot equals) assignment operator to express the equal +ity # (not just a plain equals) $ym .= $d + ($a - $d) / (1 + ($x / $c)**$b); my @dy = map {$dyda->(,($_))} (0..3); #my @dy = @{$dyda->unpdl}; # Partial derivative of the function with respect to parameters. # Again, note .= assignment operator (not just "equals") # # http://www.numberempire.com/derivativecalculator.php # but could use Math::Symbolic # formula: d+(a-d)/(1+(x/c)^b) # del(a) # c^b/(x^b+c^b) $dy[0] .= $c**$b / ($x**$b + $c**$b); # del(b) # ( (c^b*d-a*c^b)*x^b*log(x) # +(a*c^b*log(c) # -c^b*log(c)*d)*x^b) # / (x^(2*b)+2*c^b*x^b+c^(2*b)) $dy[1] .= eval { ( ($c**$b*$d-$a*$c**$b)*$x**$b*log($x) + ($a*$c**$b*log($c)-$c**$b*log($c)*$d)*$x**$b ) / ($x**(2*$b)+2*$c**$b*$x**$b+$c**(2*$b)) }; # del(c) # -((b*c^b*d-a*b*c^b)*x^b) # / (c*x^(2*b)+2*c^(b+1)*x^b+c^(2*b+1)) $dy[2] .= -1 * (($b * $c**$b * $d - $a * $b * $c**$b) * $x**$b) / ($c * $x**(2 * $b) + 2 * $c**($b + 1) * $x**$b + $c**(2 * $b + 1) ); #say 'DEL C: ' . $dy[2]; # del(d) # x^b/(x^b+c^b) $dy[3] .= $x**$b / ($x**$b + $c**$b); } __DATA__ x est 1 0.100593962 42 1.154242343 83 1.474328422 124 1.654247992 165 1.775551897 207 1.866286312 248 1.934367102 289 1.988797445 330 2.033521036 371 2.071092313 412 2.103249977 454 2.131859705 495 2.15646498 536 2.178448777 577 2.198304263 619 2.21682969 660 2.233437802 701 2.248813997 742 2.263134718 783 2.276540152 825 2.289441049 866 2.301316212 907 2.312557085 948 2.323226159 989 2.333376276 1031 2.343283185 1072 2.352515249 1113 2.361348508 1154 2.369813742 1195 2.377938641 1237 2.385935063 1278 2.393445607 1319 2.400685418 1360 2.407673856 1401 2.414428798 1443 2.421123656 1484 2.42745521 1525 2.433599534 1566 2.43956963 1607 2.445377464 1648 2.451034016 1649 2.451170177 1690 2.456681041 1731 2.462054198 1772 2.467293088 1813 2.472401069 1855 2.477501313 1896 2.482354207 1937 2.487085836 1978 2.491699229 2019 2.496197342 2061 2.500688643 2102 2.504962126 2143 2.509128821 2184 2.513191399 2225 2.517152459 2266 2.52101454 2308 2.524870771 2349 2.528539984 2390 2.53211751 2431 2.53560564 2472 2.539006607 2514 2.542402424 2555 2.545633551 2596 2.548783938 2637 2.551855602 2678 2.554850511 2720 2.557840884 2761 2.560686231 2802 2.563460478 2843 2.5661654 2884 2.568802732 2926 2.57143607 2967 2.573941696 3008 2.576384711 3049 2.57876668 3090 2.581089127 3132 2.583408057 3173 2.585614525 3214 2.587765856 3255 2.58986343

This results in:

Use of uninitialized value $ninf[6] in sprintf at c:/berrybrew/5.26.1_ +64_pdl/perl/site/lib/PDL/Fit/Levmar.pm line 1250, <DATA> line 82. ------LEVMAR REPORT:---- Estimated parameters: [-1 1 2134 2.2739532] Covariance: [ [-Inf -Inf NaN Inf] [ Inf NaN NaN NaN] [ Inf Inf Inf -Inf] [-Inf Inf NaN NaN] ] ||e||_2 at initial parameters = Inf Errors at estimated parameters: ||e||_2 = Inf ||J^T e||_inf = 0 ||Dp||_2 = 1.79769e+308 \mu/max[J^T J]_ii ] = 0 number of iterations = 0 reason for termination: = number of function evaluations = 1 number of jacobian evaluations = 0 number of linear systems solved = 0 -----END LEVMAR REPORT---- ----PDL::Fit::LM---- lmfit trying initp: [-1 1 2134 2.2739532] lmfit trying initp: [-1 1 1067 1.1369766] lmfit trying initp: [-1 1 533.5 0.56848831] lmfit trying initp: [-1 1 266.75 0.28424416] lmfit trying initp: [-1 1 133.375 0.14212208] lmfit trying initp: [-1 1 66.6875 0.071061039] lmfit finalp: [-0.2066481 0.54367392 60.393814 2.8999219] iters: 14 RSQ: 0.999354122952557 lmfit RSQ: 0.999354122952557 lmfit RMSE: 8.35455913578738e-005

minpack via R:

# R CODE my.data = ' x y 1 0.100593962 42 1.154242343 83 1.474328422 124 1.654247992 165 1.775551897 207 1.866286312 248 1.934367102 289 1.988797445 330 2.033521036 371 2.071092313 412 2.103249977 454 2.131859705 495 2.15646498 536 2.178448777 577 2.198304263 619 2.21682969 660 2.233437802 701 2.248813997 742 2.263134718 783 2.276540152 825 2.289441049 866 2.301316212 907 2.312557085 948 2.323226159 989 2.333376276 1031 2.343283185 1072 2.352515249 1113 2.361348508 1154 2.369813742 1195 2.377938641 1237 2.385935063 1278 2.393445607 1319 2.400685418 1360 2.407673856 1401 2.414428798 1443 2.421123656 1484 2.42745521 1525 2.433599534 1566 2.43956963 1607 2.445377464 1648 2.451034016 1649 2.451170177 1690 2.456681041 1731 2.462054198 1772 2.467293088 1813 2.472401069 1855 2.477501313 1896 2.482354207 1937 2.487085836 1978 2.491699229 2019 2.496197342 2061 2.500688643 2102 2.504962126 2143 2.509128821 2184 2.513191399 2225 2.517152459 2266 2.52101454 2308 2.524870771 2349 2.528539984 2390 2.53211751 2431 2.53560564 2472 2.539006607 2514 2.542402424 2555 2.545633551 2596 2.548783938 2637 2.551855602 2678 2.554850511 2720 2.557840884 2761 2.560686231 2802 2.563460478 2843 2.5661654 2884 2.568802732 2926 2.57143607 2967 2.573941696 3008 2.576384711 3049 2.57876668 3090 2.581089127 3132 2.583408057 3173 2.585614525 3214 2.587765856 3255 2.58986343 ' data = read.delim(textConnection(my.data),header=TRUE,sep="\t",strip.w +hite=TRUE) ##install.packages("minpack.lm") library(minpack.lm) init_cond = list(a = -1, b = 1, c = 2134, d = 2.27395324135227) formula = as.formula ("y ~ d + (a - d) / (1 + (x / c)**b)") nlsLM(formula, data=data, start=init_cond)

Results in:

> nlsLM(formula, data=data, start=init_cond) Nonlinear regression model model: y ~ d + (a - d)/(1 + (x/c)^b) data: data a b c d -0.2066 0.5437 60.3938 2.8999 residual sum-of-squares: 0.006767 Number of iterations to convergence: 16 Achieved convergence tolerance: 1.49e-08

Replies are listed 'Best First'.
Re: perl minpack bindings or other function fitting code?
by swl (Prior) on Nov 06, 2017 at 02:49 UTC

    Given the lack of responses, I'll assume there are no bindings, or at least none that are obvious. Some further searching has turned up a few related modules, but none that implement minpack type functionality (see below).

    In terms of the specific problem I described, the fitting issues were partly a PEBKAC, and partly the general approach to derivatives used in the various fit routines.

    The PDL::Fit::LM code in my example needed a bounds constraint to ensure the partial derivative with respect to $c avoided log(0). Setting it to $c = max (0.000000001, $c) before any calculations resulted in convergence on the first attempt. I'd already tried bounds in an earlier iteration to avoid negative values (line was commented out in the code) but used 0 instead of a small positive value, hence the partial derivative with respect to $c could be set to infinity because that is how PDL handles log(0), as opposed to croaking as plain perl does.

    PDL::Fit::Levmar also supports bounds, but I did not get it to work after a cursory attempt. The fact that it does not install cleanly via cpanm means that I won't look at it further for my needs in any case. search.cpan.org also cannot locate it, but it is at http://search.cpan.org/~jlapeyre/PDL-Fit-Levmar-0.0096/ if people want to see the details.

    After continuing to look into the general MINPACK process, that library uses finite difference methods to calculate the derivatives instead of user specified functions. It should be possible to adapt the PDL routine to do the same, but that's probably something to take to the PDL list if I don't work it out myself.

    And to provide a bit more detail in answer to my initial query, there are levenberg-marquardt fitting routines in the GNU Standard Library, but they are currently not included in the perl bindings. I don't know if there are plans to do so, but assume that if it were easy to do then it would likely have been done already.

    The CEPHES library also includes the lmdif code from MINPACK, but the Math::CEPHES library does not include that as part of the subset it provides.

    The final point perhaps worth noting is that the bounded PDL code takes 68 iterations to converge given my original starting parameters, whereas the minpack code via R takes 16. I'm assuming this is primarily due to differences in the settings used, for example the step size, but the minpack code does use other modifications. Setting the starting values to [-1,1,1,2.27395324135227] causes PDL::Fit::LM to use 22 iterations and the R version to use 21.

      This is an update for people who find this thread in the future.

      PDL-Fit-Levmar has been updated and is now indexed on metacpan.

      It also works best of the modules I tried, converging on a solution without being sensitive to starting conditions.

      My previous issues were due to my own syntax errors. The main thing is that the code examples use the form x = f(t), not y = f(x) which I was assuming when I was testing, and hence my fits were using reversed axes. Such are the consequences of not reading the manual closely enough...

      We're always open to more/better functions in the GSL bindings for PDL! If you'd like to help out (I promise it's pretty easy, the existing bindings offer a great pattern to follow), please join one of the mailing lists - see PDL for links.