Test::More::is compares using eq, so it's doing stringification on the floating point values. Different builds of perl can stringify differently in the far-out place values. So even if your module is returning exactly the same value, the string comparison might be different on two different systems.
IF you want numerical equivalence testing, you should use cmp_ok instead -- but that's a big "IF".
It's not Test::More that is returning 0.99999999999999996; it is $embed_pass->compare(...) -- that is, your method is what's returning that value. And if I skimmed the source correctly, you are calling a method from Data::CosineSimilarity , which you do not control. For floating-point math, if you don't control the chain 100%, then you cannot control whether rounding differences will occur in unexpected places. For values that are floating point -- especially when those values go through many steps or go through one or more steps that you do not control -- then I highly recommend you decide what is an acceptable precision for your module's needs, and code your test in such a way that it accepts anything within that range.
If you want to support 32bit floats, then I would say your test should make sure that you are within 1e-6 * $expected -- ie, cmp_ok abs($got/$exp-1), '<=', 1e-6 ... or maybe 0.5e-6 if you're brave (it's been a while since I've done the calcs, for whether that works with 32bit float which has 23bit mantissa -- but my back-of-the-envelope says the ULP is about 0.12e-6 relative to the power of two, and since your mantissa can be nearly twice the power of two, 0.25e-6 is the best, so 0.5e-6 would be as small as I'd want to go.) (Caveat: if $exp is 0, that of course won't work, and you could probably use abs($got-$exp) instead. But in this example, $exp was 1, so you're safe.)
I would think that would be enough precision to make sure your module is doing its part of the job correctly.
Alternately, doing a sprintf '%.6f' on both the $got and $exp would allow you to do a string comparison using 'is', regardless of the floating size and stringification differences.
To sum up: When testing a module that does floating point math, unless you know for sure you can guarantee exact values, you need to test against an expected accuracy rather than looking for exact values (whether that's done through sprintf-rounding or through numeric comparison of fractional-delta-vs-accuracy instead of got-vs-expected). | [reply] [d/l] [select] |
These tests become much easier to read and write if you switch to Test::Deep with num or Test2::V0 with within.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
| [reply] [d/l] |
From the failing CPAN test report, the failing test seems to be this line number 37 in 01-openai.t:
is( $comp_pass1, 1, "Compare got $comp_pass1");
As a matter of style, I find it clearer to use the explicit cmp_ok for these types of tests
(see also this perlmaven article).
Without fully understanding your module, it looks like a simple floating point rounding error;
the normal way to deal with these is to introduce an epsilon value,
as noted in Test::Number::Delta:
At some point or another, most programmers find they need to compare floating-point numbers for equality.
The typical idiom is to test if the absolute value of the difference of the numbers is within a desired tolerance, usually called epsilon.
| [reply] [d/l] [select] |
If I were to guess, it's -Duselongdouble.
I had build failures with this flag (and also quadmath) with my module JSON::SIMD. In my case I had to adapt some of the code to use a slower but more precise number parser.
You could either find a way to fix your algorithms so that the tests really pass even with these build flags, or if you decide that this loss of precision (that only happens with a build flag that's usually off anyway), you could fudge or disable the test with these flags.
| [reply] [d/l] |
You could either find a way to fix your algorithms...
I can't...the calculations are not performed by my module, they use Data::CosineSimilarity
if you decide that this loss of precision...
In every use case I'm aware of, precision isn't needed in Embeddings. Rather, they are rated in order from "1" (highest similarity) to "-1" (lowest similarity) to match the same meaning in the text being evaluated.
So, from a use case the test failure is not an issue. But it doesn't strike me as good practice to publish tests that fail...
| [reply] |