Re: How does one avoid tautologies in testing?
by moritz (Cardinal) on Jul 15, 2009 at 15:43 UTC
|
Sometimes you do have to test for tautologies, because they describe basic sanity. For example the Perl 6 test suite contains test such as
ok True, 'True is, well, True';
if True {
pass('if True works');
} else {
flunk('if True works');
}
You can't believe in how many ways you can screw up a compiler, and the more basic the failing test, the easier it is to diagnose; even if the look like tautologies.
That aside, this got me started:
use an alternate algorithm to extract the data, preferably one that uses more hard coded data than the algorithm being tested.
Why should your testing algorithm extract anything from anywhere? A test that re-implements parts of the tested algorithm is always a bad test.
Just make up examples that don't need any extracting.
for example if you have a sub
sub s {
my $str = shift;
return split /:/, $str;
}
in my mind the way of testing this "by extracting" is
is_deeply [s("a:b")], [split /:/, 'a:b'], $msg;
which if of course a bad idea. Testing, by definition, works with examples, so a better way to test that would be
is_deeply [s("a:b")], [qw(a, b)], $msg;
I feel a bit stupid writing this to you, since you surely know this much about testing already, but maybe it helps anyway :-)
The second thing that you can always do is to test structures, not values. For example in the case of the function about you test that the number of list items one greater than the number of colons in the source string, you could test the length of join()ed result arrays etc. Since I don't know your BUCOs I don't know how feasible that is. | [reply] [d/l] [select] |
|
|
Thank you for such a thoughtful response. I'm really glad you weren't afraid to write for fear of feeling stupid. Thinking about what you said gave me an couple of ideas to explore.
You raise a really good point about structural testing. I'm not sure either how it applies for my current project, but it is worth spending some time on figuring out how. I've definitely used that technique before and it lends itself easily to random input testing. The most memorable time was when I was testing a calculation engine. In addition to the usual hard coded tests for inputs and outputs I added a bunch of tests that verified that the operators complied with certain provable relations for real numbers (fields), for example, that --x=x or a+b=b+a. After "anchoring" test results using literal values as input/output pairs, I threw a large number of randomly generated inputs at the engine and verified that the operators continued to be well behaved. I enjoyed that alot. (I also learned a lot more than I ever wanted to know about the difference between numbers as a mathematical concept and the computer representations of numbers).
I totally agree with your discomfort about 'extracted data' and that was a large part of the reason I posted. However, your reading of my first option (alternate algorithm) seems closer to what I meant by the second option (recode from spec). By "alternate algorithm" I meant finding a genuinely different algorithm. In logic, it isn't that different from a technique we were both taught in Physics 101 or even earlier - if there are two ways to solve a problem, check your work by solving the problem both ways. Unfortunately, there isn't always a good second algorithm.
By extracting, I meant something different from your split example, which I agree is a bad idea. Instead I meant using information from a mock environment and combining it with made up information to produce a result. Perhaps 'extraction' was a bad choice of words.
Consider the example given above of converting the relative path to a fully qualified path. I can and did make up the relative path portion of the file names. But I couldn't just make up a fully qualified path because in that particular test suite, one test verified that files were created and deleted according to spec. Another test verified that file search filters indeed selected the correct files from the file system. A third test verified that one of my BUCOs correctly generated a family of subdirectories and bound the correct (in memory) object to each subdirectory. This isn't a version control system, but parts of the application do interact with the file system in ways that are as complex. (Aside: I wonder how subversion and git do their testing...surely they have similar issues?)
So the root has to be a real root on a real computer and I needed an assortment of real files to be searched for and selected within that root directory. I see only two ways of safely creating a real root directory for tests that doesn't conflict with any existing directory: (a) run the tests on a clean virtual machine where I can safely control exactly what files already exist (b) generate a random temporary directory on an existing machine (and autodelete it).
The first, a virtual machine, would allow me to make things up entirely, but it would have setup costs that would make it unrealistic for repeated debug-patch cycles. A test suite that requires a virtual machine is also is not very appropriate for CPAN modules. I'd like a solution (or at least a philosophy) that could be used for anything I decide to release on CPAN.
The second option, generating a random root directory,
can be easily and quickly set up, but runs the risk of calculation errors. It means that I have to concatenate the root to one or more made up relative paths. Concatenating a root with a relative path may be a trival algorithm, but, all the same, it is definitely vulnerable to stupid typos.
Both options have risks and at the end of the day we have a trade off between different risks (non-use of test vs. inaccuracy of test). But if you can think up a way to do the kind of testing I just described using only made-up hard coded values and no virtual machine, I won't object :-).
Best, beth
| [reply] [d/l] [select] |
|
|
The second option, generating a random root directory, can be easily and quickly set up, but runs the risk of calculation errors. It means that I have to concatenate the root to one or more made up relative paths. Concatenating a root with a relative path may be a trival algorithm, but, all the same, it is definitely vulnerable to stupid typos.
You're right there, but your options aren't totally exhausted here.
- Rely on an existing, well tested solution to do the concatenation for you
- you might be able to chdir into the newly created root dir and use only relative paths
- Maybe your API has an option to hand back the fully qualified paths. If that's the case, you can for example check if the file exists (-e), has some content (-s > 0), maybe even has the right magic numbers etc.
- Another option if you get back the fully qualified paths is to check if the made-up relative part of the path is a substring of the absolute path
- Again another option is to recursively search through your root, looking for files of the right name, or maybe just looking at the total number of files before and after a create operation.
Yes, you give up some precision in testing, but if you do fear the tautologies that might be a worthy trade-off. Or not, depending on your situation of course.
(Aside: I wonder how subversion and git do their testing...surely they have similar issues?)
SVN has different storage backends, and I guess you can pretty much black-box test them by using their API. You pump in lots of new revisions of entirely made-up data, and look if you can later retrieve it. Nobody really cares in which file the backend puts which data.
I guess with git it's pretty much the same: when you add a commit, there's no guarantee in which packfile it will end up; things like git-repack will change that anyway. The important thing is just the files on disk after the write still conform to their file format, and that it can be retrieved again.
| [reply] [d/l] [select] |
Re: How does one avoid tautologies in testing?
by gwadej (Chaplain) on Jul 15, 2009 at 16:48 UTC
|
It's important to keep in mind that unit testing often serves multiple purposes. Partially, the tests serve to understand and validate the code. They also help document usage. They (hopefully) also get used for regression testing. I find tautological tests (let's call them Ttests) are most useful in this final case.
moritz hits the nail on the head with the comments about the Perl 6 testing. The purpose of these simple Ttests is to make sure something fundamental has not changed. If (after your tests have been in use for a while) one of these Ttests fails, you know immediately that something fundamental in the system has been broken.
For example, will this test save you someday when someone decides that some caching or lazy-loading solution can be added because it could not possibly break anything.
Some of the longest debugging sessions I've ever been part of were finally resolved when we tested something that could not possibly be wrong and it was. Every time I put in a Ttest, I think about these sessions. Every now and then, I catch a bug this way.
| [reply] |
Re: How does one avoid tautologies in testing?
by LanX (Saint) on Jul 15, 2009 at 22:03 UTC
|
Isn't the act of testing per definition a redundancy and therefore tautologies not avoidable?
I think even the name is wrong, shouldn't it be called "assuring" instead of "testing".
For example: If I try if a knife cuts a piece of paper before I buy it in a shop, I'm testing this unknown knife. Who knows if it's sharp otherwise?
But if I try my knife just after I sharpened it, I'm assuring that it's sharp, even if I already should know it's redundant to do so!
Don't be to critical about your test-suite! Only when you encounter a new bug which wasn't found by you're tests, you can be sure that you're tests haven't been sufficient. And only by a adding a new test covering this bug, you can approximate the ideal test suite. It's a dynamic process.
Beside philosophical thoughts, can't you just automatically record real live "BUCOs" and reuse them afterwards instead of generating them artificially?
| [reply] |
|
|
Isn't the act of testing per definition a redundancy and therefore tautologies not avoidable?
No, testing in the sense that it's used in programming is checking actual results against expected results.
A tautology is a logic expression or formula that's always true - not only when your program is working, but also when it's broken.
For example the addition operator + is rather primitive, and you're hard-pressed to test it in terms of even more primitive operations. If you accidentally test + in terms of -, and for the CPU that's the same operation, your test will always pass, even if the arithmetic unit in your CPU is broken - as long as it's still deterministic.
However if you test it in terms of examples, a broken + can be detected - no tautology here.
For example: If I try if a knife cuts a piece of paper before I buy it in a shop, I'm testing this unknown knife. Who knows if it's sharp otherwise?
But if I try my knife just after I sharpened it, I'm assuring that it's sharp, even if I already should know it's redundant to do so!
Not all tests are regression tests.
| [reply] [d/l] [select] |
Re: How does one avoid tautologies in testing?
by dHarry (Abbot) on Jul 16, 2009 at 10:22 UTC
|
(With the risk of telling you what you already know.)
I think in Software Engineering literature articles can be found on the topic of “Tautology Testing”. I even encountered approaches based upon it: e.g. Tautology Based Development (TBD) or Tautology Test Driven Development (TTDD)?!
A tautology test asserts that the code does what the code does?! I have my doubts. I have difficulty appreciating tautology tests except for maybe some really specialized cases. For normal business applications it sounds to me like overtesting and therefore a waste of money. Money that might be spend on other SQA activities like static testing to gain confidence in the code. How to avoid it? Maybe Injection Testing? See here for some info. My personal approach is try to prevent it and when I do see it throw it out.
Although I do test the SW I write before I throw it over the proverbial wall (normally unit tests and an integration test, preferably in a representative test environment), I prefer/demand other people to test it as well. It won’t be first time that I keep reading over my own mistake and simply fail see it. IMO software testing should provide an objective and therefore independent view of the quality of the SW; the more other people test your code the better.
Testing is an engineering discipline in its own right. Years ago I hired an independent test consultant from a company specialized in quality. The model used was V2M2 This was a real eye-opener for me and I think it’s safe to say the project benefited a lot from it. I especially liked the (good) test coverage and traceability to the requirements. Whenever possible I follow this approach, i.e. outsource the testing as much as possible.
Cheers
Harry
| [reply] |
|
|
Yeah - I saw that TBD stuff. But others haven't and the purpose of posting a node on Perl Monks is to create a discussion from which we can all, not just the OP, learn.
Under ideal circumstances I prefer to have different people do writing and testing - it also helps a lot in identifying documentation errors and fuzzy specs. Often the person who does the coding is so close to the problem that they are unaware of their implicit assumptions. But small teams don't always have that luxury. Given that much important software innovation comes from under-resourced start-ups, skunk teams within corporations, and open-source projects, I think it is important to develop testing philosophies that work for teams both large and small.
As for tautology testing I have mixed feelings. As gwadej pointed out (and LanX echoed) there are other reasons for testing (regression, crash testing) and they are very important. If your software is going to have a life cycle with new features and patches then regression testing is reason enough to pay the cost of test development.
As I ponder the discussion so far, I'm beginning to realize that many things that seem like tautologies are not actually tautologies. It all depends on what you are using the test for and why. As long as we are very clear on what the test can and cannot verify, the test may still be valuable.
For example, as moritz discusses, testing that "true is, well, true" is still a valuable test if you are testing a compiler because there are an unbelievable number of ways to screw up a compiler. The test seems like a tautology only when we discount the amount of processing involved for a compiler to decide that True is true. gwadej makes a similar point when he discusses debugging things that "could not possibly be wrong".
Or take an even more extreme example, also given by moritz: testing sub s. If the purpose of the test is to verify that something is correctly split then using split to test a sub that calls split is a very bad idea. However, the purpose of such a test may be something very different.
Suppose you have API documentation that says that parameters should be delivered in a particular order and you want to verify that sub s indeed expects those parameters. A true Ttest (to borrow gwadej's term) would pass when even the code is wrong. Parameters in the wrong order will not pass if they are wrong! Hence, for purposes of parameter order testing, even moritz's sub s/split example isn't a tautology.
In conclusion, I think at least three questions need to be answered to decide if Ttesting is overtesting:
- Is the test valuable for reasons other than conformance? (i.e. regression)
- Is the test a true tautology, and if not, what exactly is it testing that is not tautological?
- Is the thing that is actually being tested important enough to the success of the system to justify the cost? If you are publishing an API, getting parameter order right is rather important. If all your users are going to check the code anyway before they use it, then maybe it is just a "nice to do".
As I think about what as been said so far on this thread, I am honestly surprised at how many non-technical factors and trade-offs seem to be creeping into the decision process of deciding what and how to test. Your post does a good job of stressing that point. So far we have:
- Non-use of tests/accuracy of tests
- intended life cycle of software
- business value of non-tautological portion of test.
I wonder what others will appear.
Best, beth
| [reply] [d/l] [select] |
|
|
As I think about what as been said so far on this thread, I am honestly surprised at how many non-technical factors and trade-offs seem to be creeping into the decision process of deciding what and how to test.
Aren't most of the interesting (annoying, frustrating, etc.) problems in programming the non-technical ones?.
Novice programmers can believe that the technical challenges are the only ones we consider. This is why they often apply technical solutions to business problems and are surprised when they don't work.
As we gain more experience, it's necessary to question why we do what we do. This question makes us all think about why we test what we test and what benefits we derive from them.
| [reply] |