in reply to Re^4: Self-testing modules
in thread Self-testing modules

See Test::LectroTest for the only cpan test module (I consider) worthy of the Test:: prefix.

Simplified rational:

Most bug arise as a result of the programmer making assumptions. If the same programmer writes the tests for the code s/he wrote, they will make the same assumptions. The net result is they write tests for every case they considered when writing the code, which all pass--giving them N of N tests (100%) passed and a hugely false sense of security.

The cases they fail to tests for, are the same cases they failed to consider when writing the code, and those are normally the same cases that crop up immediately they demo it or put it into production.

With anything other than the most trivial of functions, hoping to test all possible combinations of inputs and verify those outputs is forlorn. Ie. impossible for any practical sense of the term.

Therefore, the only way to test code is to test is complience against a (rigourous) specification, and derive security through statistics. Ideally, this would go one step further than LectroTest and retain a record of failing values and these would be reused (along with a new batch of randomly generated ones) at each subsequent test cycle. (IMO) this is the only way that testing will be lifted out of it's finger-to-the-wind, guesswork state and move into something approaching a science.

LectroTest isn't perfect (yet). It has fallen into the trap of becoming "expectation complient" in as much as it plays the Test::Harness game of supplying lots of warm-fuzzies in the form of ok()s, and perpectuating the anomoly of reporting 99.73% passed instead of 0.27 "failed", or better still:

***FAIL*** Line nnn of xxxxxx.pl failed running function( 1, 2, 3 ); Testing halted.

Preferably dropping the programmer into the debugger at the point of failure. Even more preferably, in such a way that the program can be back-stepped to the point of invocation and single stepped through the code with the failing parameters in-place so that the failure can be followed.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Replies are listed 'Best First'.
Re^6: Self-testing modules
by adrianh (Chancellor) on Jul 30, 2005 at 14:43 UTC
    Most bug arise as a result of the programmer making assumptions. If the same programmer writes the tests for the code s/he wrote, they will make the same assumptions. The net result is they write tests for every case they considered when writing the code, which all pass--giving them N of N tests (100%) passed and a hugely false sense of security.

    I find that this does not happen if you're using TDD. When you only write code by producing a failing test you are forced to challenge the assumptions in your code at every stage. Every time you make something work the next stage is "how do I break this".

    Therefore, the only way to test code is to test is complience against a (rigourous) specification, and derive security through statistics.

    As you can probably guess I don't agree with with the "therefore" and "only" :-)

    Specification based testing is a great tool, but it's certainly not the be-all and end-all of testing. It brings it's own set of good and bad points to the table, and is still affected by bad developer assumptions about the code. They're just assumptions of a different kind.

    There's a whole bunch of different ways to go about testing. From specification based tests like Test::LectroTest, xUnit frameworks like Test::Class, procedural tests like the basic uses of Test::More and friends, data driven tests like Test::Base, exploratory testing, integration testing frameworks like FIT, etc.

    Take a look at Lessons Learned in Software Testing for a great book on the multitude of useful approaches to testing.

    Not to mention practices like Test Driven Development and Design By Contract.

    Picking the best tool for the work at hand is part of job.

    LectroTest isn't perfect (yet). It has fallen into the trap of becoming "expectation complient" in as much as it plays the Test::Harness game of supplying lots of warm-fuzzies in the form of ok()s, and perpectuating the anomoly of reporting 99.73% passed instead of 0.27 "failed", or better still:

    Well the nice thing about Perl is that if you don't like the test reporting you can always change it. In fact, since I spent a chunk of yesterday re-learning how to fiddle with Test::Harness::Straps...

    .... insert sound of typing here ...

    ...there you go.

    Personally, since test suites in Perl take so darn long to run, I like seeing the okays whirl past in the background since it let's me know the darn thing hasn't hung.

    I realise that anything less than 100% pass at the end means I've fucked up, warm-fuzzies or not.

    Preferably dropping the programmer into the debugger at the point of failure. Even more preferably, in such a way that the program can be back-stepped to the point of invocation and single stepped through the code with the failing parameters in-place so that the failure can be followed.

    There are already so called Omniscient Debugging tools available for Java. So I guess it's just a trivial matter of programming :-)

      Most bug arise as a result of the programmer making assumptions. If the same programmer writes the tests for the code s/he wrote, they will make the same assumptions. The net result is they write tests for every case they considered when writing the code, which all pass--giving them N of N tests (100%) passed and a hugely false sense of security.

      I find that this does not happen if you're using TDD. When you only write code by producing a failing test you are forced to challenge the assumptions in your code at every stage. Every time you make something work the next stage is "how do I break this".

      I'm still making up my mind about TDD, but for the time being I think I'm leaning towards agreeing th BrowserUk on this one. In my experience the nasty bugs come, literally, from where I least expect them, and this is crucial. There is no hope that I will somehow write a test to catch such a bug, no matter how hard I try, because the best I can do is test those aspects that I regard as potential sources of problem. And in fact, during my recent applications of TDD, some very nasty bugs have arisen despite a rigorous adherence to TDD principles. (These bugs have all become manifest after the system had "aged" a bit and attained a particular—and as it turns out ill-conditioned—state; therefore, all the simple tests that tested functions to produce expected outputs missed these "history-dependent" bugs. I'm beginning to see that the functional programming folks are on to something with their avoidance of assignment and side effects.)

      Also, I find interesting the difference between your take on TDD and that described by Kent Beck in his widely cited TDD by Example. Beck uses "test first" only as a precondition for adding functionality to his software. I.e., he says that one should not write any new code in one's application until one has written a failing test that will succeed only after the new code has been written. He makes no mention of writing tests specifically designed to make the software fail. Admittedly, one can view this sort of "stress" testing as a special case of Kent's formulation. Namely, the "functionality" one is adding is general robustness. Still I am surprised that Beck's book puts so little emphasis on this aspect of testing.

      the lowliest monk

        In my experience the nasty bugs come, literally, from where I least expect them, and this is crucial. There is no hope that I will somehow write a test to catch such a bug, no matter how hard I try, because the best I can do is test those aspects that I regard as potential sources of problem. And in fact, during my recent applications of TDD, some very nasty bugs have arisen despite a rigorous adherence to TDD principles.

        Of course I'm not saying that TDD guarantees zero bugs, no testing strategy can do that. However it's been my experience, and the experience of others, that TDD dramatically reduces the number of bugs.

        However TDD is a skill, and it takes time to learn and get good at it. It took me a good few months before I really got it. Some of the mistakes that I made were:

        • Writing more code than is necessary to make the test pass.
        • Fixing an "obvious" bug without writing a failing test first.
        • Not refactoring after every passing test.
        • Writing new code that doesn't directly make a failing test pass, especially easy to do after I'd just finished refactoring something.

        What helped me grok TDD was dropping down to insanely small increments. Write the most obviously stupid non-general code to get the test to pass as quickly as possible. Then write a test to break that really stupid code.

        (These bugs have all become manifest after the system had "aged" a bit and attained a particular—and as it turns out ill-conditioned—state; therefore, all the simple tests that tested functions to produce expected outputs missed these "history-dependent" bugs. I'm beginning to see that the functional programming folks are on to something with their avoidance of assignment and side effects.)

        Since you're still getting a large number of nasty bugs my suspicion is that there are possibly some elements of TDD that you're missing.

        Could you give a (small :-) code example that we could talk about that shows a bug that you missed during TDD?

        Also, I find interesting the difference between your take on TDD and that described by Kent Beck in his widely cited TDD by Example. Beck uses "test first" only as a precondition for adding functionality to his software. I.e., he says that one should not write any new code in one's application until one has written a failing test that will succeed only after the new code has been written. He makes no mention of writing tests specifically designed to make the software fail.

        It's the same thing from a different perspective.

        If you're using TDD then every time you write a test you should expect that test to fail. It's the test failure that drives the design/development (hence the name :-)

        With TDD you don't stop when all the tests pass, you stop when you can't write any more failing tests.