in reply to Testing: shades of grey
Thank you all for your comments
With regard to the second question, the current verbose output looks like:
.. which I cannot trivially eyeball (certainly once there are 100s of tests) to distinguish true FAILs from TODOs. I could live with eyeballable verbose output looking like:ok 12 - expr -a+b ok 13 - expr 1/a ok 14 - expr a.b not ok 15 - expr a/b # TODO # Failed (TODO) test 'expr a/b' # at t/expr-stringify.t line 58. # got: 'a.(1/b)' # expected: 'a/b' ok 16 - expr a.(1/b) ok 17 - expr (1/a).b ok 18 - expr a^2
ok 12 - expr -a+b ok 13 - expr 1/a ok 14 - expr a.b not ok 15 - expr a/b # TODO ok 16 - expr a.(1/b) ok 17 - expr (1/a).b ok 18 - expr a^2
.. but would prefer non-verbose prove output that in some way (any way) mentioned "and 1 TODO". I understand that App::Prove is intended to be extensible, so I was hoping somebody would already have done this or have known how to do it, but I guess I'll need to dig in to that.
For the first question, I will have to take some time to absorb all the suggestions. Certainly I will start with [LanX]'s suggestion to put all OK results into one test - maybe I'd actually put them in an array rather than a hash, which gives me a free (if low-quality) ranking. But I need to think more about how to decide "OKness" scores for each of the results, and significance (weighting) scores for one test versus another.
My code only implements the mathematical constructs and the transformation rules that I have needed so far, so I don't think I can usefully test it at this point against other proof checkers, nor usefully verify it against scraped sources of LateX expressions. That said, if Lean been designed to be useful with a command line and text files I probably wouldn't be developing this.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Testing: shades of grey
by LanX (Saint) on Dec 24, 2024 at 23:56 UTC | |
by LanX (Saint) on Dec 26, 2024 at 02:08 UTC |