Outlier test fail

Bod has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Outlier test fail by pryrt (Abbot) on Jan 11, 2024 at 22:23 UTC
`Test::More::is` compares using `eq`, so it's doing stringification on the floating point values. Different builds of perl can stringify differently in the far-out place values. So even if your module is returning exactly the same value, the string comparison might be different on two different systems. IF you want numerical equivalence testing, you should use `cmp_ok` instead -- but that's a big "IF". It's not Test::More that is returning 0.99999999999999996; it is `$embed_pass->compare(...)` -- that is, your method is what's returning that value. And if I skimmed the source correctly, you are calling a method from Data::CosineSimilarity , which you do not control. For floating-point math, if you don't control the chain 100%, then you cannot control whether rounding differences will occur in unexpected places. For values that are floating point -- especially when those values go through many steps or go through one or more steps that you do not control -- then I highly recommend you decide what is an acceptable precision for your module's needs, and code your test in such a way that it accepts anything within that range. If you want to support 32bit floats, then I would say your test should make sure that you are within 1e-6 * $expected -- ie, `cmp_ok abs($got/$exp-1), '<=', 1e-6` ... or maybe 0.5e-6 if you're brave (it's been a while since I've done the calcs, for whether that works with 32bit float which has 23bit mantissa -- but my back-of-the-envelope says the ULP is about 0.12e-6 relative to the power of two, and since your mantissa can be nearly twice the power of two, 0.25e-6 is the best, so 0.5e-6 would be as small as I'd want to go.) (Caveat: if `$exp` is 0, that of course won't work, and you could probably use `abs($got-$exp)` instead. But in this example, `$exp` was 1, so you're safe.) I would think that would be enough precision to make sure your module is doing its part of the job correctly. Alternately, doing a `sprintf '%.6f'` on both the $got and $exp would allow you to do a string comparison using 'is', regardless of the floating size and stringification differences. To sum up: When testing a module that does floating point math, unless you know for sure you can guarantee exact values, you need to test against an expected accuracy rather than looking for exact values (whether that's done through sprintf-rounding or through numeric comparison of fractional-delta-vs-accuracy instead of got-vs-expected).	[reply] [d/l] [select]
Re^2: Outlier test fail by choroba (Cardinal) on Jan 11, 2024 at 22:32 UTC
These tests become much easier to read and write if you switch to Test::Deep with num or Test2::V0 with within. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^2: Outlier test fail by etj (Priest) on Dec 12, 2024 at 18:52 UTC
PDL recently (as of 2.094) incorporated Test::PDL. I wish that had happened years ago, because being able to use `is_pdl` is just great compared to the contortions needed otherwise: it does approximate matching, with configurable absolute or relative tolerances, it will check types and dimensions, report in some detail any differences, which nearly always means not needing to capture any results in a variable for failure-reporting. And using PDL for this sort of data-handling would be very obvious, and there's even a recently-added vector-cosine function for exactly Bod's use-case. What's not to like?	[reply] [d/l]
Re: Outlier test fail by eyepopslikeamosquito (Archbishop) on Jan 11, 2024 at 23:29 UTC
From the failing CPAN test report, the failing test seems to be this line number 37 in `01-openai.t`: `is( $comp_pass1, 1, "Compare got $comp_pass1");` [download] As a matter of style, I find it clearer to use the explicit `cmp_ok` for these types of tests (see also this perlmaven article). Without fully understanding your module, it looks like a simple floating point rounding error; the normal way to deal with these is to introduce an epsilon value, as noted in Test::Number::Delta: At some point or another, most programmers find they need to compare floating-point numbers for equality. The typical idiom is to test if the absolute value of the difference of the numbers is within a desired tolerance, usually called epsilon. 👁️🍾👍🦟	[reply] [d/l] [select]
Re: Outlier test fail by kikuchiyo (Hermit) on Jan 11, 2024 at 22:06 UTC
If I were to guess, it's `-Duselongdouble`. I had build failures with this flag (and also quadmath) with my module JSON::SIMD. In my case I had to adapt some of the code to use a slower but more precise number parser. You could either find a way to fix your algorithms so that the tests really pass even with these build flags, or if you decide that this loss of precision (that only happens with a build flag that's usually off anyway), you could fudge or disable the test with these flags.	[reply] [d/l]
Re^2: Outlier test fail by Bod (Parson) on Jan 11, 2024 at 22:44 UTC
You could either find a way to fix your algorithms... I can't...the calculations are not performed by my module, they use Data::CosineSimilarity if you decide that this loss of precision... In every use case I'm aware of, precision isn't needed in Embeddings. Rather, they are rated in order from "1" (highest similarity) to "-1" (lowest similarity) to match the same meaning in the text being evaluated. So, from a use case the test failure is not an issue. But it doesn't strike me as good practice to publish tests that fail...	[reply]