sdani has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

My script generates a tons of histograms like this: http://kepfeltoltes.hu/130518/pAKT-tAKT_www.kepfeltoltes.hu_.png

As you can see there are two lognormal distribution on top of each other and I need the parameters of those distribution. To get that, I was trying to use Algorithm::CurveFit function, but the the Math::Symbolic package falis to parse my equation. (That I got from the wikipedia page: http://en.wikipedia.org/wiki/Log-normal_distribution#Probability_density_function)

I think I am making some mistakes when I phrasing the equation (I was trying the single lognormal first).

my $equation = '(exp(-1 * (log(x) - m)**2 / (2 * r ** 2)))/(x * r * 2. +5066)'; # x - independent variable # m - mean # r - standard dev.

Here is a sample dataset:

0.0643566964735115 1 0.078213413896854 2 0.0862233698701667 2 0.0950536377503222 5 0.104788226940961 12 0.115519750377919 35 0.127350305630187 52 0.140392446235774 116 0.154770252513572 160 0.17062051203871 187 0.188094021012202 250 0.207357018905863 240 0.228592770031414 258 0.252003307080518 252 0.277811353223423 251 0.306262441052688 246 0.337627248531191 197 0.372204174168167 182 0.410322175922981 184 0.4523439008454 167 0.498669135227145 130 0.549738608088756 159 0.606038184187406 130 0.668103486437085 102 0.736524991717946 123 0.811953648555319 118 0.895107070113873 114 0.986776361425825 111 1.0878336458061 111 1.19924036205923 102 1.32205641141595 76 1.4574502412217 98 1.60670996131109 81 1.7712555988274 84 1.95265260807701 68 2.15262676394876 47 2.37308058059129 42 2.61611141155243 45 2.88403140358156 29 3.17938949393169 33 3.50499566043902 19 3.86394765509039 19 4.25966047541651 10 4.69589885409594 8 5.17681307586923 7 5.70697846251846 3 6.29143890156414 6 7.64605614922042 2 8.42910051556786 1

Was there some error with the equation? Should I use other package to do it? What would be the best way to go? I really need wisdom here! :) Thanks for your help!

Daniel

Replies are listed 'Best First'.
Re: fitting double lognormal distribution on a dataset
by hdb (Monsignor) on May 18, 2013 at 11:14 UTC

    I have not used that package but the documentation of Algorithm::CurveFit seems to specify that powers need to be written as x^n and not x**n.

      G'day hdb,

      ... the documentation of Algorithm::CurveFit seems to specify that powers need to be written as x^n and not x**n.

      Firstly, I will say that the first line of code in the Algorithm::CurveFit SYNOPSIS:

      my $formula = 'c + a * x^2';

      is written in a way that suggests this intent

      $ perl -MO=Deparse,-p -e 'my $formula = $c + $a * $x**2;' (my $formula = ($c + ($a * ($x ** 2)))); -e syntax OK

      more than it does this one

      $ perl -MO=Deparse,-p -e 'my $formula = $c + $a * $x^2;' (my $formula = (($c + ($a * $x)) ^ 2)); -e syntax OK

      However, further down they have

      The formula should be a string that can be parsed by Math::Symbolic.

      The doco for that module has

      Warning: The operator to use for exponentiation is the normal Perl operator for exponentiation **, NOT the caret ^ which denotes exponentiation in the notation that is recognized by the Math::Symbolic parsers! The ^ operator will be interpreted as the normal binary xor.

      [Update (cosmetic): 2 instances of quoting my own code — <blockquote>s removed.]

      -- Ken

        Thanks for the clarification. I find this documentation very confusing. If I see an example like '1/2 * m * v^2' I believe that the caret is the operator for exponentiation and I do not see any need to even continue reading. Obviously wrong here.

        A fully working example would be useful here to test such presumptions...

      That's funny. I was looking for the documentation of Math::Symbolic and they were writing that the regular ** is for the power not ^, that is the xor logical gate. Now I'll definitely try. Thanks.