in reply to Re: Need advice on test output
in thread Need advice on test output

I'm not sure what you're asking. I read the output to find out which tests, if any, are failing.

If one needs this output in radically different format (one person I know needs it in XML), then they simply write a different harness which creates the output format they need. For the moment, I'm focusing on the typical "I just ran my test suite and have a bunch of stuff on the terminal." I want to know how that 'stuff' should be formatted to be most useful to you.

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re^3: Need advice on test output
by BrowserUk (Patriarch) on Jan 05, 2007 at 12:53 UTC

    Failed Test Stat Wstat Total Fail List of Failed ---------------------------------------------------------------------- +--------- t/bar.t 4 1024 13 4 2 6-8 t/foo.t 1 256 10 1 5 (1 subtest UNEXPECTEDLY SUCCEEDED). Failed 2/3 test scripts. 5/33 subtests failed. Files=3, Tests=33, 0 wallclock secs ( 0.10 cusr + 0.01 csys = 0.11 +CPU) Failed 2/3 test programs. 5/33 subtests failed.
    1. What is 'stat'? How does it help identify the failures?
    2. Ditto 'Wstat'?
    3. What does 'UNEXPECTEDLY SUCCEEDED' mean?

      If a test is designed to fail, then does it get reported as a failure when it does fail? Or is that an 'EXPECTED FAILURE'?

    4. Which test 'UNEXPECTEDLY SUCCEEDED'?

      If it's not important enough to tell me which one, why is it important enough to bother mentioning it at all?

    5. What is the difference between "test scripts" and "test programs"?

      And if they are the same thing, why is it neccesary to give me the same information twice?

      Actually, 3 times. "Files=3, Tests=33, " is just a subset of the same information above and below it.

    6. When was the last time anyone optimised their test scripts?

      Is there any other use for that timing information?

    Of course, you'll be taking my thoughts on this with a very large pinch of salt as I do not use these tools. The above are some of the minor reasons why not.

    Much more important is that there are exactly two behaviours I need from a test harness.

    • The default, no programming, no configuration, pull it and run it, out-of-the-box behaviour is that I point at a directory of tests and it runs them. If nothing fails, it should simply say that.

      "Nothing failed" or "All tests passed".

      I have no problem with a one line, in place progress indicator ("\r..."), but it should not fill my screen buffer with redundant "thats ok and that ok and thats ok" messages. I use my screen buffer to remember things I've just done: the results of compile attempts, greps etc.

      Verbose output that tells me nothing useful, whilst pushing useful information off the top of my buffer is really annoying. Yes, I could redirect it to null, but then I won't see the useful stuff when something fails.

      Converting 5/10 into a running percentage serves no purpose. A running percentage is only useful if it will allow me to predict how much longer the process will take. As the test harness doesn't know how many tests it will encounter up front, much less how long they will take, a percentage is just a meaningless number.

      If I really want this summary information, or other verbose information, (say because the tests are being run overnight by a scheduler and I'd like to see the summary information in the morning), I have no problem adding a command line switch (say -V or -V n) to obtain that information when I need it.

    • When something fails, tell me what failed and where. Eg. File and line number. (Not test number).

      Preferably, it should tell me which source file/linenumber (not test file) I need to look at, but the entire architecture of the test tools just does not allow this, which is why I will continue to embed my tests in the file under test.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks for the feedback. I hope the following helps.


      What is 'stat'? How does it help identify the failures?
      'stat' is the exit code and indicates how many tests failed. However, since it doesn't report numbers in excess of 255, it's not terribly useful and I don't know that it's used.
      Ditto 'Wstat'?
      'wstat' is the wait status of the test. I also don't know how it's used and initially I didn't provide it, but I was told emphatically on the QA list that I should, so I did.
      What does 'UNEXPECTEDLY SUCCEEDED' mean?
      This is the number of tests which were marked TODO but passed anyway.
      If a test is designed to fail, then does it get reported as a failure when it does fail? Or is that an 'EXPECTED FAILURE'?
      A test designed to fail is generally a TODO test and if it fails, it it not reported as a failure or an 'EXPECTED FAILURE'.
      Which test 'UNEXPECTEDLY SUCCEEDED'?
      Currently Test::Harness is not able to track or report which tests unexpectedly succeeded but TAPx::Harness can and does.
      If it's not important enough to tell me which one, why is it important enough to bother mentioning it at all?
      See the note to the previous question. It is important, but Test::Harness doesn't have this ability.
      What is the difference between "test scripts" and "test programs"?
      Nothing.
      And if they are the same thing, why is it neccesary to give me the same information twice?
      I don't understand this question.
      Actually, 3 times. "Files=3, Tests=33, " is just a subset of the same information above and below it.
      It's a summary report. You may find it useful or you may not. Alternate suggestions welcome :)
      When was the last time anyone optimised there test scripts? Is there any other use for that timing information?
      I sometimes use it when I'm profiling my code and make try to optimize it. The timing information often tells me if I've made a significant difference.
      Verbose output that tells me nothing useful, whilst pushing useful information off the top of my buffer is really annoying. Yes, I could redirect it to null, but then I won't see the useful stuff when something fails.
      That's a good point. I could easily make a 'quiet' mode which only reports overall success or failure. That would let you rerun the test suite to see what actually failed, if anything.
      Converting 5/10 into a running percentage serves no purpose.
      Agreed. I was just trying to mimic the behavior of Test::Harness. Others have pointed out that it doesn't help and I'll probably just drop it.
      When something fails, tell me what failed and where. Eg. File and line number. (Not test number).
      Unfortunately, TAP format does not support this. Those data are embedded in the diagnostics and there is no way to disambiguate this information from the other diagnostic information. This is a feature that is planned for TAP 2.0.

      I might add that the runtests utility I've written (not yet on the CPAN but analogous to prove), allows you to specify which test harness you want to run your tests through. Thus, you can easily create a new harness to satisfy your particular needs.

      Cheers,
      Ovid

      New address of my CGI Course.

        I don't understand this question.

        In the output posted, the following three lines appear consecutively:

        Failed 2/3 test scripts. 5/33 subtests failed. Files=3, Tests=33, 0 wallclock secs ( 0.10 cusr + 0.01 csys = 0.11 +CPU) Failed 2/3 test programs. 5/33 subtests failed.

        The first and last are identical except for the words "scripts" and "programs". Why repeat the same information?

        And the first two fields of the middle line,

        1. "Files=3" replicates the last digit of "Failed 2/3 test scripts".
        2. "Tests=33" replicates the last digits of "5/33 subtests failed"
        It's a summary report. You may find it useful or you may not.

        2 1/2 of those 3 lines are redundant.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Regarding the last portion of your comment: the distinction you're making is close to that between a user installing a module, and a developer running tests for his or her own code.

      FWIW: for case #1, CPANPLUS doesn't by default emit output to screen when installing a module. It tells you when something went wrong. A test harness could be made to know how many tests it will run, and even conceivably an estimate of how long they are expected to take relative to each other, if this information is stored when the maintainer creates the distribution.

      Regarding case #2, the developer, Pugs' Test.pm does in fact produce coordinates for test cases. We use this to cross-link information in the smoke matrices with the actual test code. There's no reason that I know of why this couldn't be ported to the Perl 5 testing frameworks.

      File and line number makes no sense in the context of a failed test. Consider:
      1: for my $key ( sort keys %tests ) { 2: is( $tests{$key}, $wooby{$key} ); 3: }
      If you know that the test failed on line 2, that doesn't help you much.

      xoxo,
      Andy

        File and line number makes no sense in the context of a failed test.

        Oh contrare. I'd at least have a starting point, even in this somewhat contrived example.

        In more normal cases, about 95% of those test scripts I've looked at, that consist of long linear lists of unnumbered ok()s and nok()s, having the line number of the failing test would save me from have to to play that most rediculous of games--count the tests. Are they numbered from zero or one? Does a TODO count or not? Do tests that exists inside runtime conditional if blocks count if the runtime condition fails? If no, how can I know whether that runtime condition was true of false? Etc.

        Of course, in this case I'd need other information too. But then in this case, the test number would be of no direct benefit either. In this case I'd have to modify the .t file to print out a sorted list of the keys to %tests at runtime, as there would be no other way to work out which test related to test N.

        Oh damn! But then tracing stuff out from with in a test script is a no-no, because the test tools usurp STDOUT and STDERR for their own purposes, taking away the single most useful, and most used, debugging facility known to programmer kind: print.

        And there you have it, todays number one reason I do not use these artificial, overengineered, maniacally OO test tools. They make debugging and tracing the test script 10 times harder than doing so for the scripts they are meant to test.

        They are an unbelievably blunt instrument, who's basic purpose is to display and count the numbers of boolean yeses and nos. To do this simple function

        • they usurp the simplest and best debugging tool available.
        • force me to divorce my tests from the code under test.
        • curtail my ability to use debuggers.
        • wrap several layers of complication around the debugging process.
        • and throw away reams of useful--I would say vital--information in the process.

        And all of this so as to produce a bunch of 'pretty pictures and statistics' that I have no use for and have to use yet another layer (the test harness) to sift and filter to produce the only statistic I am interested in.

        What failed and where?

        For all the world this reminds me of those food ads and packaging that proclaim to world; "Product X is 95% fat free!". Ug. You mean that 5% of that crap is fat?

        To date, the best testing tool I've seen available is Smart::Comments. It's require, assert, ensure, insist, check, confirm, & verify methods are amazingly simple, amazingly powerful.

        • These allow me to place the tests right there in the code being tested.
        • One file including code and tests.
        • When failures occur, I get useful information, including but not limited to the file and line number where the failing test occurred.
        • They are easily and quickly enabled & disabled by the addition of a single comment card at the top of the code.
        • I can enable them on a per file basis and so only test that code I am interested in and not wait for all the tests I'm not interested in to execute first.
        • I can have multiple levels of test that allow me to use a course granularity of tests to zone in on the failure and then fine granularity to isolate the exact point of failure--in the code that is being tested, not some third party test script.
        • Most importantly of all. They allow me to take some users testcase that is causing failures in my code, turn the tracing and debugging on in my module(s), run that user script, and see the results.

          This is immediate and accurate.

          I do not have to modify the user supplied testcase in any way. And that is the holy grail of testing. Run the user script, unmodified on my system with debugging enabled within my modules only.

          And if the users testcase has a bunch of complex dependancies that I do not or cannot have, I can instruct the user to go into his copy of my modules and delete 1 character and all of my tests are enabled. He can then run his testcase in his environment and supply the output to me, and I can see exactly what went on.

          This is priceless!

        • Finally, when testing is complete, being a source filter, commenting out the use line, means that all--every single bit of the test code; the overhead; the setup; everything--gets the hell outta dodge. It is simply gone.

        Smart::Comments is the single, most useful, and most underrated module that theDamian (and possibly anyone) has yet posted to CPAN. I recognised the usefulness of the concept long ago when I came across Devel::StealthDebug, which may or may not have been the inspiration for Smart::Comments. In use, the former proved to be somewhat flaky, but theDamian has worked his usual magic with the concept (whether it was the inspiration for it or not), and come up with a real, and as yet unrecognised, winner.

        To achieve the perfect test harness,

        1. supply a patch to Smart::Comments that allows it to be enabled/disabled via an environment variable, (with the default being OFF).
        2. Also patch it so that with an appropriate setting in that environment variable, failing asserts et al, becomes non-fatal. So that they log the assertion failure (warn style, but with Carp::cluck-style traceback), and allow the code to continue (as it would in production environments).

          The information, as logged for failure, would also be logged for success in this mode of operation.

        3. Write a test harness application to parse that output and produce whatever statistics and summary information is useful.

        Why haven't I written it yet? Because I keep hoping that Perl6, oops, Perl 6 is 'just around the corner', and I'm hoping that Smart::Comments will be built-in.

        Of course, a few additional modules wouldn't go amiss. Smart::Comments::DeepCompare, Smart::Comments::LieToTheCaller and few others, but mostly it's all right there.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.