Re^2: Theorem proving and Unit testing

Remember - unit tests aren't meant to prove the algorithm works; they're meant to determine the adherence of the implementation to the algorithm. So, even if you use all the cases in the proof of the algorithm as tests of the implementation, you still haven't proven the validity of the code like you can prove the validity of the algorithm

I guess the way I am looking at it is that I am not so much looking to directly prove the validity of the algorithm as much as I am trying to test how well the functions retain the integrity of the datastructure and through that indirectly test the validity of the algorithms. This is why I thought that using random data might work out well. Here is an example from the binary tree proofs which may help explain my point of view better (assume 't' is the binary tree).

postorder(mirror(t)) = reverse(preorder(t))

This basically states that the postorder of a mirror of the tree should be equal to the reverse of the preorder of that same tree. This (indirectly) tests the validity of the mirror, postorder and preorder algorithms (the reverse is perl's and assumed to work) by testing that they all work together in the correct way. Running this test on hundreds of randomly constructed trees IMO should say something about the underlying algorithms. Am I wrong in assuming that? Of course if the algorithms are wrong, then with a test like this, they would all have to be wrong in the same way.

-stvn

Comment on Re^2: Theorem proving and Unit testing

Replies are listed 'Best First'.
Re^3: Theorem proving and Unit testing by dragonchild (Archbishop) on Aug 26, 2004 at 12:20 UTC
It does say something about the implementation of the algorithm. It doesn't actually say a darn thing about the algorithm itself. Here's another way to look at it. You're trying to say "If the algorithm passes test X, then the implementation should pass test X." Absolutely true. However, you can throw every test like this that you can think of against a given implementation, but the implementation actually doesn't implement the algorithm! It just happens to pass all your tests. You can still look at the proof of an algorithm, though, and glean a number of tests which will have the largest confidence increase per test. For example, if you threw hundreds of randomly-constructed trees at that test, you might gain a confidence of 90% in the algorithm for a cost of 300 tests. You can gain that same 90% confidence by looking at the edge cases of trees and constructing, say, 10 tests. So, each test in the second series is worth 30 tests from the first series. For example, I would look at testing: The null tree A tree with only a root root + 1 child root + 2 children linear three generations etc ... At that point, you can probably look at the tests and develop an inductive proof that says "I don't need to test linear six generations because linear three generations would break in the same way, if linear six generations would break." Most people intuitively develop tests like this, but most don't actually go through the effort of proving that their tests will work. Another way to look at this is if you're testing something with 30 binary properties, you don't actually need to test all 230 possible situations. You only need to run the 32 basic tests (1 all off, 30 for turning each one on, 1 all on), then turning certain ones on together, based on the specification and a little knowledge of how the code works. Now, you and I would both love to have the 230 tests to make sure that unintended interactions don't creep in. But, I know I don't have the CPU time to run a billion+ tests. So, I don't have 100% confidence in my code, but I do have 99% confidence, based on the 32+ tests I ran. And, I can run those tests every time I make a tiny change. I might actually run the billion+ tests (or a random subset of a million tests) if I were building out to production and could let it run overnight, but that wouldn't be a standard part of my test suite. ------ We are the carpenters and bricklayers of the Information Age. Then there are Damian modules.... sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon.* - flyingmoose I shouldn't have to say this, but any code, unless otherwise stated, is untested	[reply]

Replies are listed 'Best First'.

Re^3: Theorem proving and Unit testing
by dragonchild (Archbishop) on Aug 26, 2004 at 12:20 UTC

implementation

Here's another way to look at it. You're trying to say "If the algorithm passes test X, then the implementation should pass test X." Absolutely true. However, you can throw every test like this that you can think of against a given implementation, but the implementation actually doesn't implement the algorithm! It just happens to pass all your tests.

You can still look at the proof of an algorithm, though, and glean a number of tests which will have the largest confidence increase per test. For example, if you threw hundreds of randomly-constructed trees at that test, you might gain a confidence of 90% in the algorithm for a cost of 300 tests. You can gain that same 90% confidence by looking at the edge cases of trees and constructing, say, 10 tests. So, each test in the second series is worth 30 tests from the first series.

For example, I would look at testing:

The null tree
A tree with only a root
root + 1 child
root + 2 children
linear three generations
etc ...

At that point, you can probably look at the tests and develop an inductive proof that says "I don't need to test linear six generations because linear three generations would break in the same way, if linear six generations would break." Most people intuitively develop tests like this, but most don't actually go through the effort of proving that their tests will work.

Another way to look at this is if you're testing something with 30 binary properties, you don't actually need to test all 2**30 possible situations. You only need to run the 32 basic tests (1 all off, 30 for turning each one on, 1 all on), then turning certain ones on together, based on the specification and a little knowledge of how the code works. Now, you and I would both love to have the 2**30 tests to make sure that unintended interactions don't creep in. But, I know I don't have the CPU time to run a billion+ tests. So, I don't have 100% confidence in my code, but I do have 99% confidence, based on the 32+ tests I ran. And, I can run those tests every time I make a tiny change. I might actually run the billion+ tests (or a random subset of a million tests) if I were building out to production and could let it run overnight, but that wouldn't be a standard part of my test suite.

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

I shouldn't have to say this, but any code, unless otherwise stated, is untested

[reply]