Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Theorem proving and Unit testing

by stvn (Monsignor)
on Aug 25, 2004 at 18:48 UTC ( [id://385774]=perlmeditation: print w/replies, xml ) Need Help??

So let me start by saying that I was not a math major, and have never really studied math formally, so if I am way off on some of these concepts, please correct me.

I have been reading recently about using more formal methods like structural inductions and theorems as ways to verify a the correctness of a program. This is something much more prevalent in the functional programming circles and Standard ML seems to be well suited for it. So as I was reading through my Standard ML book, I stumbled upon a number of theorems in regard to lists. It struck me how easily they might be converted into unit tests using Test::More as the logical framework with which to "prove" the theorems. Now, I know that this is by no means a formal proof method and that it might give some staunch math professors indigestion at the thought of it. But to my non-mathematical mind, it seemed like a great way to build some unit tests for testing the integrity of data-structures. But before I get to excited about this, I thought I would put it out for the mathematically inclined folks in the monastery and see what you thought about it.

On the unit-test side of things, I noticed while I was finished writing a set of unit tests for my module Tree::Binary that I found myself creating arbitrary trees to test the values and actions of certain methods. I realized after reading a few theorems of binary trees in the same book I had read the list theorems in, that some of my tests actually looked a little like those theorems. The difference was that instead of testing the equality of two expressions, I was testing an expression against the manually arrived at expected value. This to me seems somewhat fragile, and the idea of writing a test that would prove true in any situation seemed to me to be much more solid.

Now my (likely flawed) understanding of mathematical induction is that give property P you want to prove P(x) for all values of x (or all natural numbers). And with structural induction, the aim is to prove P(x) for all empty and non-empty forms of x. Since testing all possible variances of x would not be possible, and it is not possible to use the reasoning of formal proof methods in unit-tests. My idea of a solution was to use (pseudo) randomly generated structures instead, I don't know if this is way off or not.

Now, at first glance you this might not look like it tests all that much. And you would be right, it doesn't. But this is really just an example, my real question is how valuable this approach to testing might be.

#!/usr/bin/perl use strict; use warnings; use Test::More tests => 4; my @list1 = rand_list((rand() * 100)); my @list2 = rand_list((rand() * 100)); my @list3 = rand_list((rand() * 100)); # length(append(L1, L2)) = length(L1) + length(L2) cmp_ok( len(append(@list1, @list2)), '==', (len(@list1) + len(@list2)), '... length(append(L1, L2)) = length(L1) + length(L2)'); # reverse(append(L1, L2)) = append(reverse(L2), reverse(L1)) is_deeply( [ rev(append(@list1, @list2)) ], [ append(rev(@list2), rev(@list1)) ], '... reverse(append(L1, L2)) = append(reverse(L2), reverse(L1))'); # append(L1, append(L2, L3)) = append(append(L1, L2), L3) is_deeply( [ append(@list1, append(@list2, @list3)) ], [ append(append(@list1, @list2), @list3) ], '... append(L1, append(L2, L3)) = append(append(L1, L2), L3)'); # reverse(reverse(L1)) = L1 is_deeply( [ rev(rev(@list1)) ], [ @list1 ], '... reverse(reverse(L1)) = L1'); # --------------------------------------------- sub rand_list { my ($num_items) = @_; return map { (rand() * 100) % $num_items } 0 .. $num_items; } sub len { my @list = @_; return scalar @list; } sub append { my (@combined) = @_; return @combined; } sub rev { my @list = @_; return reverse @list; } 1;
-stvn

Replies are listed 'Best First'.
Re: Theorem proving and Unit testing
by dragonchild (Archbishop) on Aug 25, 2004 at 18:58 UTC
    Interesting topic. My take on it would be:
    • Write up an algorithm. Use some sort of formal method to prove the algorithm works.
    • Write up some code to implement that algorithm. Using the proof, design test cases based on that proof. (Random is usually bad, imho.)

    That would definitely take the confidence level in the tests up quite a ways. Not to the 100% mathematical proofs have, but a lot higher than in-the-dark stabbing.

    Remember - unit tests aren't meant to prove the algorithm works; they're meant to determine the adherence of the implementation to the algorithm. So, even if you use all the cases in the proof of the algorithm as tests of the implementation, you still haven't proven the validity of the code like you can prove the validity of the algorithm.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Remember - unit tests aren't meant to prove the algorithm works; they're meant to determine the adherence of the implementation to the algorithm. So, even if you use all the cases in the proof of the algorithm as tests of the implementation, you still haven't proven the validity of the code like you can prove the validity of the algorithm

      I guess the way I am looking at it is that I am not so much looking to directly prove the validity of the algorithm as much as I am trying to test how well the functions retain the integrity of the datastructure and through that indirectly test the validity of the algorithms. This is why I thought that using random data might work out well. Here is an example from the binary tree proofs which may help explain my point of view better (assume 't' is the binary tree).

      postorder(mirror(t)) = reverse(preorder(t))

      This basically states that the postorder of a mirror of the tree should be equal to the reverse of the preorder of that same tree. This (indirectly) tests the validity of the mirror, postorder and preorder algorithms (the reverse is perl's and assumed to work) by testing that they all work together in the correct way. Running this test on hundreds of randomly constructed trees IMO should say something about the underlying algorithms. Am I wrong in assuming that? Of course if the algorithms are wrong, then with a test like this, they would all have to be wrong in the same way.

      -stvn
        It does say something about the implementation of the algorithm. It doesn't actually say a darn thing about the algorithm itself.

        Here's another way to look at it. You're trying to say "If the algorithm passes test X, then the implementation should pass test X." Absolutely true. However, you can throw every test like this that you can think of against a given implementation, but the implementation actually doesn't implement the algorithm! It just happens to pass all your tests.

        You can still look at the proof of an algorithm, though, and glean a number of tests which will have the largest confidence increase per test. For example, if you threw hundreds of randomly-constructed trees at that test, you might gain a confidence of 90% in the algorithm for a cost of 300 tests. You can gain that same 90% confidence by looking at the edge cases of trees and constructing, say, 10 tests. So, each test in the second series is worth 30 tests from the first series.

        For example, I would look at testing:

        • The null tree
        • A tree with only a root
        • root + 1 child
        • root + 2 children
        • linear three generations
        • etc ...

        At that point, you can probably look at the tests and develop an inductive proof that says "I don't need to test linear six generations because linear three generations would break in the same way, if linear six generations would break." Most people intuitively develop tests like this, but most don't actually go through the effort of proving that their tests will work.

        Another way to look at this is if you're testing something with 30 binary properties, you don't actually need to test all 2**30 possible situations. You only need to run the 32 basic tests (1 all off, 30 for turning each one on, 1 all on), then turning certain ones on together, based on the specification and a little knowledge of how the code works. Now, you and I would both love to have the 2**30 tests to make sure that unintended interactions don't creep in. But, I know I don't have the CPU time to run a billion+ tests. So, I don't have 100% confidence in my code, but I do have 99% confidence, based on the 32+ tests I ran. And, I can run those tests every time I make a tiny change. I might actually run the billion+ tests (or a random subset of a million tests) if I were building out to production and could let it run overnight, but that wouldn't be a standard part of my test suite.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: Theorem proving and Unit testing
by kvale (Monsignor) on Aug 25, 2004 at 19:15 UTC
    Mathematical induction is as follows. Suppose you want to prove some property holds for all x, where x is a member of some sequence, e.g., the natural numbers. Then prove
    • the property holds for the first element in the sequence
    • if the property holds for an element in the sequence, then it holds for the next element in the sequence
    Then by repreatedly applying the second result, you have shown by induction that it holds for all elements.

    Scientific induction, a part of the scientific method, is a looser version of this. It says that if you (even better, others as well) try an experiment a bunch of times under the same conditions and get the same result, that result is likely to be a solid 'fact' upon which you can build theories and reason about other experiments. The more times you and others try it, the more confidence you build.

    Your procedure above is more akin to scientific induction than mathematical induction. Which is fine, but it isn't really a formal method in that you are not proving anything, just building evidence that code seems to work.

    -Mark

      Your procedure above is more akin to scientific induction than mathematical induction.

      I am sorry if I wasn't clear, but I am not talking about doing mathematical induction, but more structural induction. Which is a more generalized version of mathematical induction. Which to my understanding says that to prove P(x) you must test P with both a empty and non-empty value of x.

      Which is fine, but it isn't really a formal method in that you are not proving anything, just building evidence that code seems to work.

      I am not hoping it to be a truely formal method, but more a basis to build good unit tests off of. It seems to me that testing arbitrary data against manually calculated results (which is commonly how people built unit tests), is not as good as some kind of (possibly) provable statement which can be translated into a unit test.

      -stvn

        Structural induction is not only possible, but fun and easy (well, possibly for grad-student values thereof). What you want to do is remember that induction consists of a base and a recursive case -- for mathematical induction, this comes from the fact that every natural number but zero is the successor of another nat.

        For a binary tree, you'd prove theorems inductively by proving them on leaf nodes, then proving that if both of an interior node's children satisfy the theorem, the interior node must as well. (It's a bit more difficult with lazy languages, but that's not your problem here.)

        Nice job on the root node, btw.

        --
        F o x t r o t U n i f o r m
        Found a typo in this node? /msg me
        % man 3 strfry

        Alternative way (to that by FoxtrotUniform) of structural induction for binary tress would be:
        • to prove it for a tree build of only one node
        • to prove it that if it is true for two binary trees than it is true for a tree build from those two trees connected with a node
        The hard part is the 'if it is true ... then ...' - the induction step, I don't see how this can be achieved with random generated trees.
Re: Theorem proving and Unit testing
by sleepingsquirrel (Chaplain) on Aug 25, 2004 at 20:24 UTC
    You might want to take a look at something like QuickCheck to get further inspiration.
    QuickCheck is a tool for testing Haskell programs automatically. The programmer provides a specification of the program, in the form of properties which functions should satisfy, and QuickCheck then tests that the properties hold in a large number of randomly generated cases.


    -- All code is 100% tested and functional unless otherwise noted.

      Thanks for the link, this looks really interesting, very much what I am thinking about. Now only if there were a perl version,... hmmm. Have you yourself ever used this system?

      -stvn
        Nope. Never used it myself, although I've been planning to for a while now. And if you look at the bottom of the page you'll see people have imported the idea to languages like Erlang, Scheme, Common Lisp, and Python, so I'd think a perl version wouldn't be too tough.


        -- All code is 100% tested and functional unless otherwise noted.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://385774]
Approved by herveus
Front-paged by revdiablo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 09:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found