I have two questions here, one about what I want to do, the other about what I'm currently doing.
I'm writing tests for an early-stage mathematical proof checker (axiom). In some cases, there are tests for which there are multiple results that would class as "correct", but some results are better than others.
For example, when stringifying a mathematical expression such as a-b, the result 'a(-b)' would be "not ok", but any of 'a+(-b)', 'a+-b', 'a-b' would be "ok" - and with some more ok than others.
Has anybody implemented a test regime that accounts for this sort of thing? The only approach I've thought of so far is some sort of "quality" metric, independent of pass/fail statistics, wherein "more ok" results push the quality metric up. It is not obvious to me how to quantify that though, or how results from individual tests should combine.
Secondly, I'm currently trying to implement these as TODO tests, using Test::More and prove. However when I run a test file individually, a failing TODO test stands out like a true fail; and when I run under prove it just gives output like:
t/expr-stringify.t .. ok All tests successful. Files=1, Tests=29, 0 wallclock secs ( 0.03 usr 0.00 sys + 0.31 cusr + 0.04 csys = 0.38 CPU) Result: PASS
.. which gives no indication that there were failing TODO tests.
What are people's preferred way to get a clean indication of "all passing, but with $n failing TODO tests"?
In reply to Testing: shades of grey by hv
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |