in reply to Re: Equality checking for strings AND numbers
in thread Equality checking for strings AND numbers

Good observation. Most values are integers, but with different precisions. Real numbers SHOULD have the same precisions in these files, and I actually want to detect if they don't i.e. 10 and 10.000000000000001 should be treated as different. If exact comparison on reals becomes and issue, I guess I could use sprintf to compare only the leading decimal places, or do a ratio comparison. Thanks for the heads-up.
  • Comment on Re^2: Equality checking for strings AND numbers

Replies are listed 'Best First'.
Re^3: Equality checking for strings AND numbers
by lin0 (Curate) on Jul 13, 2007 at 12:31 UTC

    when comparing numbers, I tend to avoid using == to go for something like:

    sub equality{ my ($a, $b, $eps) = @_; abs( $a-$b ) < $eps ? return 1: return 0; }

    where $eps is the desired precision

    Cheers,

    lin0
      lin0's code above is generally referred as the "Within Epsilon" check, and is an important advancement in understanding how to compare floating point numbers. It may be the best approach you can do in pure perl, at least with any hope of runtime performance.

      However, the Within Epsilon family of checks is terribly sensitive to the actual values involved. Properly choosing a threshold epsilon (or the value of $eps in lin0's code) is important and unfortunately, depends on the values you are trying to compare.

      An epsilon of 0.00000000000001 will not be useful for larger numbers in the billions, since the floating point number has to hold a larger exponent and can thus not hold as much precision in the mantissa.

      Anyone who wants to know more about comparing IEEE floating point numbers in software "the right way" should have a brief read through http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm which includes nice example implementations and discussions of useful (and reasonably fast) C "almost equal" checks that will outperform and adapt better than the classic and naive Within Epsilon technique.

      --
      [ e d @ h a l l e y . c c ]

        Hi halley,

        Thank you for the reference. It is really useful!

        In my case, the "within epsilon check" works reasonably well because most of the time I normalize the data to have zero mean and one standard deviation. The rest of the time I work with fuzzy sets which are constrained to be in the interval [0, 1]

        Cheers,

        lin0
        Sounds like a good module candidate, probably using XS?