http://qs1969.pair.com?node_id=270259

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

Test contracts -- shoring up some testing problems

This is a long node dealing with various thoughts about building robust software and an idea I am working on to get around some annoying problems with testing. Specifically, unit tests don't check that different components work together, but integration tests often miss where an actual problem lies. What follows is some thoughts I've had about how to get around this problem.

Part 1: Argument handling

In Perl, all subroutines are inherently variadic. That is to say, they take a variable number of arguments:

sub foo { my ($foo,$bar,@baz) = @_;

The nice thing about that is that it makes argument handling easy. The not nice thing about that is that it makes argument handling too simplistic. Anyone who's longed for the ability to override methods based on signatures knows what I'm talking about. For example, in Java, you can do this:

public void set_foo(int bar) { ... } public void set_foo(int bar, int length) { ... }

The system knows to call the correct method based upon the number and types of arguments that I supply. If I want to do that in Perl, I frequently have to set up complicated conditionals in my subs to properly dispatch based upon my arguments (or use another module like Class::MultiMethod). The name, return type and argument types of a method are referred to as its signature. However, it's fair to ask what those method signatures are. In reality, when I declare the the first argument to foo() is an integer, I am really doing nothing more than adding a very simplistic test. To a large extent, statically typed languages are all about sprinkling simple tests throughout our code, but the types that are declared are a bit arbitrary. What if, in reality, the domain of the first argument to foo() is not all integers, but all even integers greater than zero? Having it declared as an integer is barely acceptable.

One way around this is to exhaustively test every argument to every subroutine or method.

sub foo { my ($bar,$baz,$quux) = @_; croak "bad bar" unless $bar > 0 && ! $bar % 2; # now test $baz # now test $quux # do our stuff }

Um, sure. We all do this, right? No, we don't. There are a variety of problems with this. The first is obvious: we can get so much debugging code into every subroutine or method that we start to obscure the intent of the code. Second, programmers often think that since this subroutine is buried deep within the system and never gets called directly by the user, all he or she has to do is ensure that it never gets passed bad data. This is a common strategy and actually isn't all that bad if you have a good test suite.

Another problem is that while we know that what we received was good, it tells us nothing about the state of what we return or whether or not the state of the class in which we're operating has been left in good shape (i.e., a class or package variable has not been changed to an inconsistent value).

That's where Design by Contract (DBC) gets involved. With contracts, (implemented in Perl with Class::Contract), we can carefully specify the domain of what we accept, the domain of what we emit and invariants which must respect a certain domain when they're done (like the class and package variables mentioned above). However, Class::Contract is specifically tied to the concept of classes, objects and methods. If you're writing a functional module, it doesn't seem like a proper conceptual fit. Further, it's not exactly intuitive for most to use. We want this stuff to be easy.

Part 2: Argument handling and testing

I've been thinking about a problem that sometimes crops up in testing. There are many different types of testing, but I'm thinking specifically about unit and integration testing. Let's say that I have five components, A, B, C, D, and E. I unit test the heck out of those components and they pass.

Now I do integration testing. Let's say that I test A and it calls B, which calls C, etc., all the way down to E. In the unit testing, I merely mocked up the interface to B, so A doesn't actually call it. In the integration tests, A actually calls B and even if all of our unit tests pass, the integration tests sometimes fail because of weird API problems. We can think of the call chain like this:

+-----+ +-----+ +-----+ +-----+ +-----+ | | AB | | BC | | CD | | DE | | | A |----->| B |----->| C |----->| D |----->| E | | | | | | | | | | | +-----+ +-----+ +-----+ +-----+ +-----+

In other words, the unit tests ignore AB, BC, CD, and DE. However, the integration tests also tend to ignore those. To properly test that chain and every step in it, we might consider testing DE, then CDE, then BCDE, etc. In reality, what I see happening in most test suites is that A gets tested with integration testing and the unit tests are skipped, or done very poorly. Then, when A gets a bad result, we're not always sure where it happened. Or worse, we see that E dies and we don't always know where the bad data came from.

Personally, I think this reflects a very real-world problem. I need to get my product out the door and the client is willing to accept a certain minimum level of bugs if this keeps the costs down. It's not possible to build a test suite that tests every possible combination of what can go wrong, so people write a bunch of unit tests and skip integration tests, or they write the integration tests and skip the unit tests, or they do a little of both (or just skip the tests).

Let's say in our testing that C produces a fatal error when arguments meet certain conditions. Why didn't the programmer write code to trap it in C? Because we realize that C is never called directly by the end user, but instead is fed a carefully massaged set of data which ensures that C can only receive safe data. Well, that's the theory, anyway. The reality is that C still sometimes gets bad data and we don't throw validation into every single function because we'll have so much validation code that our lumbering beast of a system is a bear to maintain. We don't know if C was passed bad data by B, or if C perhaps called D which called E which generated the bad data that gets returned. We have to stop and debug all of that to figure out where the problem lies.

Part 3: Test::Contract -- DBC for tests

Imagine a "design by contract" with testing. This combines a couple of ideas. First, I took many of the ideas from the Parameter Object thread. I'm also thinking about some of the work from Class::Contract, but making it more general to fit regular subroutines and not just methods. Some psuedo-code for the concept is like this:

sub assign_contract { my ($function_name, %contract) = @_; no strict 'refs'; my $original_function = \&$function_name; *{$function_name} = sub { my @arguments = @_; # run the first contract tests my @results; if (wantarray) { @results = $original_function->(@_); } else { my $results = $original_function->(@_); @results = $results; } # run post-condition tests on @results; } return @results; }

The idea is that the programmer sets up a "Contract" for each of A, B, C, D, and E and runs the tests with the contracts in place. These contracts are tests and passing or failing is noted in the test suite, but we don't have to write extra tests. If I test A, the contract tests for B, C, D, and E automatically get run. If F calls B and follows the same call chain, then I write tests for F and tests for B, C, D, and E still automatically get run without the programmer needing to write any extra tests for this!. In other words, I wind up with tests that specifically trace the flow of data through the system. Tests no longer ignore AB, BC, CD, DE, etc.

This has the benefit that we can focus our tests on integration testing and still not lose the benefits of unit testing. If C fails and we have properly defined contracts, we simply read our test output to find which of our contract tests have failed and we have a pretty good idea of what caused the failure in C without potentially tedious debugging. Further, while this is a significant performance hit, we don't have to worry about this in the actual production system.

Once I started working on the idea, I saw some significant implementation issues, but I think they can be worked around. I can't just add the contracts to a test script because if two test scripts use the same object, I don't want to duplicate the contract. That means putting the contracts in their own file.

use Test::More tests => 32; use Test::Contract 'contract_file';

The contract file might be a Perl script that points to a directory holding contracts for all namespaces. The problem I see with that is obvious: if I load Foo::Bar, how do I wrap the methods in contracts? I could try tying symbol tables, but tie happens at runtime and symbol tables entries are often loaded at compile time. I see many problems there.

Another possible approach is to see if I can override use and require. I've never tried it, though, and I suspect it's not possible.

Finally, I could potentially have every package specify where it's contract is loaded:

package Foo::Bar; use Contract::File 'Foo::Bar::Contract'; # I don't quite like this

With that, we could have the packages responsible for their own contracts and the contract file would (perhaps) check to see if $ENV{TEST_CONTRACT} is set. If it it, it sets up the test contracts. If it's not, it simply returns with minimal overhead.

Are there other strategies for implementing this that I might be missing? Are there any holes in this idea?

Cheers,
Ovid

Looking for work. Here's my resume. Will work for food (plus salary).
New address of my CGI Course.