in reply to Re^5: Self-testing modules
in thread Self-testing modules

Most bug arise as a result of the programmer making assumptions. If the same programmer writes the tests for the code s/he wrote, they will make the same assumptions. The net result is they write tests for every case they considered when writing the code, which all pass--giving them N of N tests (100%) passed and a hugely false sense of security.

I find that this does not happen if you're using TDD. When you only write code by producing a failing test you are forced to challenge the assumptions in your code at every stage. Every time you make something work the next stage is "how do I break this".

Therefore, the only way to test code is to test is complience against a (rigourous) specification, and derive security through statistics.

As you can probably guess I don't agree with with the "therefore" and "only" :-)

Specification based testing is a great tool, but it's certainly not the be-all and end-all of testing. It brings it's own set of good and bad points to the table, and is still affected by bad developer assumptions about the code. They're just assumptions of a different kind.

There's a whole bunch of different ways to go about testing. From specification based tests like Test::LectroTest, xUnit frameworks like Test::Class, procedural tests like the basic uses of Test::More and friends, data driven tests like Test::Base, exploratory testing, integration testing frameworks like FIT, etc.

Take a look at Lessons Learned in Software Testing for a great book on the multitude of useful approaches to testing.

Not to mention practices like Test Driven Development and Design By Contract.

Picking the best tool for the work at hand is part of job.

LectroTest isn't perfect (yet). It has fallen into the trap of becoming "expectation complient" in as much as it plays the Test::Harness game of supplying lots of warm-fuzzies in the form of ok()s, and perpectuating the anomoly of reporting 99.73% passed instead of 0.27 "failed", or better still:

Well the nice thing about Perl is that if you don't like the test reporting you can always change it. In fact, since I spent a chunk of yesterday re-learning how to fiddle with Test::Harness::Straps...

.... insert sound of typing here ...

...there you go.

Personally, since test suites in Perl take so darn long to run, I like seeing the okays whirl past in the background since it let's me know the darn thing hasn't hung.

I realise that anything less than 100% pass at the end means I've fucked up, warm-fuzzies or not.

Preferably dropping the programmer into the debugger at the point of failure. Even more preferably, in such a way that the program can be back-stepped to the point of invocation and single stepped through the code with the failing parameters in-place so that the failure can be followed.

There are already so called Omniscient Debugging tools available for Java. So I guess it's just a trivial matter of programming :-)

Replies are listed 'Best First'.
Re^7: Self-testing modules
by tlm (Prior) on Jul 31, 2005 at 20:45 UTC

    Most bug arise as a result of the programmer making assumptions. If the same programmer writes the tests for the code s/he wrote, they will make the same assumptions. The net result is they write tests for every case they considered when writing the code, which all pass--giving them N of N tests (100%) passed and a hugely false sense of security.

    I find that this does not happen if you're using TDD. When you only write code by producing a failing test you are forced to challenge the assumptions in your code at every stage. Every time you make something work the next stage is "how do I break this".

    I'm still making up my mind about TDD, but for the time being I think I'm leaning towards agreeing th BrowserUk on this one. In my experience the nasty bugs come, literally, from where I least expect them, and this is crucial. There is no hope that I will somehow write a test to catch such a bug, no matter how hard I try, because the best I can do is test those aspects that I regard as potential sources of problem. And in fact, during my recent applications of TDD, some very nasty bugs have arisen despite a rigorous adherence to TDD principles. (These bugs have all become manifest after the system had "aged" a bit and attained a particular—and as it turns out ill-conditioned—state; therefore, all the simple tests that tested functions to produce expected outputs missed these "history-dependent" bugs. I'm beginning to see that the functional programming folks are on to something with their avoidance of assignment and side effects.)

    Also, I find interesting the difference between your take on TDD and that described by Kent Beck in his widely cited TDD by Example. Beck uses "test first" only as a precondition for adding functionality to his software. I.e., he says that one should not write any new code in one's application until one has written a failing test that will succeed only after the new code has been written. He makes no mention of writing tests specifically designed to make the software fail. Admittedly, one can view this sort of "stress" testing as a special case of Kent's formulation. Namely, the "functionality" one is adding is general robustness. Still I am surprised that Beck's book puts so little emphasis on this aspect of testing.

    the lowliest monk

      In my experience the nasty bugs come, literally, from where I least expect them, and this is crucial. There is no hope that I will somehow write a test to catch such a bug, no matter how hard I try, because the best I can do is test those aspects that I regard as potential sources of problem. And in fact, during my recent applications of TDD, some very nasty bugs have arisen despite a rigorous adherence to TDD principles.

      Of course I'm not saying that TDD guarantees zero bugs, no testing strategy can do that. However it's been my experience, and the experience of others, that TDD dramatically reduces the number of bugs.

      However TDD is a skill, and it takes time to learn and get good at it. It took me a good few months before I really got it. Some of the mistakes that I made were:

      • Writing more code than is necessary to make the test pass.
      • Fixing an "obvious" bug without writing a failing test first.
      • Not refactoring after every passing test.
      • Writing new code that doesn't directly make a failing test pass, especially easy to do after I'd just finished refactoring something.

      What helped me grok TDD was dropping down to insanely small increments. Write the most obviously stupid non-general code to get the test to pass as quickly as possible. Then write a test to break that really stupid code.

      (These bugs have all become manifest after the system had "aged" a bit and attained a particular—and as it turns out ill-conditioned—state; therefore, all the simple tests that tested functions to produce expected outputs missed these "history-dependent" bugs. I'm beginning to see that the functional programming folks are on to something with their avoidance of assignment and side effects.)

      Since you're still getting a large number of nasty bugs my suspicion is that there are possibly some elements of TDD that you're missing.

      Could you give a (small :-) code example that we could talk about that shows a bug that you missed during TDD?

      Also, I find interesting the difference between your take on TDD and that described by Kent Beck in his widely cited TDD by Example. Beck uses "test first" only as a precondition for adding functionality to his software. I.e., he says that one should not write any new code in one's application until one has written a failing test that will succeed only after the new code has been written. He makes no mention of writing tests specifically designed to make the software fail.

      It's the same thing from a different perspective.

      If you're using TDD then every time you write a test you should expect that test to fail. It's the test failure that drives the design/development (hence the name :-)

      With TDD you don't stop when all the tests pass, you stop when you can't write any more failing tests.

        Could you give a (small :-) code example that we could talk about that shows a bug that you missed during TDD?

        I'm going to have to owe you that one for the time being, at least as far as code goes. There are three bugs I can think of. Probably the lamest happened with the code I just recently posted. I had a battery of around 40 tests already, all succeeding, when suddenly the test suite started seg faulting while setting up tests whose fixtures where very similar to those of earlier tests.

        To make a long story short, and much simpler than it appeared originally, the problem looked like this:

        my %hash = map +( $_ => 1 ), ( 1..$reasonable ); my $it = Hash_Iterator->new( \%hash ); $hash{ $reasonable + $_ } = 1 for 1 .. $a_few_more; $it->start; # BOOM!
        Basically, my code had not taken into account the fact that as hashes grow, perl will allocate more memory and move the (now overcrowded) entries to more spacious digs. When this happened, my iterator was left with a dangling pointer, leading to the seg fault. Shame on me for not thinking about this from the beginning, but my point is that this bug was there all along, but my "simple tests", testing simple things, one-feature-at-a-time, missed it entirely. As BrowserUk said, if the programmer writes his/her own tests he/she is bound to omit the tests that would make manifest the nasty bug; the same lapse that led to the bug, leads to the missing test.

        The hallmark of this and all other nasty bugs I've run into while doing TDD, is that they kick in only after a particular extended sequence of manipulations that leaves the system in a state not foreseen by the programmer. Typical TDD tests tend to miss these bugs, because these tests, necessarily, tend to have very short horizons. The more elaborate the sequence of steps to bring the system to an unsound state, the less likely that the unknowing programmer will think of writing any test that will bring on the problem. (Note that the tests that elicited the bug I just described were not expected to fail the way they did. I was testing something else entirely. It was just good luck that they picked up this problem.)

        This is a very interesting subject. I have been meaning to write a meditation/book review on TDDBE. I hope to give more details then, including, hopefully, some real code.

        I think that, as you say, I probably have not quite gotten the hang of TDD yet, which accounts for some of the problems I'm having with it. But I also think that the formulation of TDD given by Beck, which is the only one I know, has been dumbed-down beyond the point of usefulness. But this is a subject that deserves more time than I can give it now.

        the lowliest monk