in reply to Re: How does one avoid tautologies in testing?
in thread How does one avoid tautologies in testing?

Thank you for such a thoughtful response. I'm really glad you weren't afraid to write for fear of feeling stupid. Thinking about what you said gave me an couple of ideas to explore.

You raise a really good point about structural testing. I'm not sure either how it applies for my current project, but it is worth spending some time on figuring out how. I've definitely used that technique before and it lends itself easily to random input testing. The most memorable time was when I was testing a calculation engine. In addition to the usual hard coded tests for inputs and outputs I added a bunch of tests that verified that the operators complied with certain provable relations for real numbers (fields), for example, that --x=x or a+b=b+a. After "anchoring" test results using literal values as input/output pairs, I threw a large number of randomly generated inputs at the engine and verified that the operators continued to be well behaved. I enjoyed that alot. (I also learned a lot more than I ever wanted to know about the difference between numbers as a mathematical concept and the computer representations of numbers).

I totally agree with your discomfort about 'extracted data' and that was a large part of the reason I posted. However, your reading of my first option (alternate algorithm) seems closer to what I meant by the second option (recode from spec). By "alternate algorithm" I meant finding a genuinely different algorithm. In logic, it isn't that different from a technique we were both taught in Physics 101 or even earlier - if there are two ways to solve a problem, check your work by solving the problem both ways. Unfortunately, there isn't always a good second algorithm.

By extracting, I meant something different from your split example, which I agree is a bad idea. Instead I meant using information from a mock environment and combining it with made up information to produce a result. Perhaps 'extraction' was a bad choice of words.

Consider the example given above of converting the relative path to a fully qualified path. I can and did make up the relative path portion of the file names. But I couldn't just make up a fully qualified path because in that particular test suite, one test verified that files were created and deleted according to spec. Another test verified that file search filters indeed selected the correct files from the file system. A third test verified that one of my BUCOs correctly generated a family of subdirectories and bound the correct (in memory) object to each subdirectory. This isn't a version control system, but parts of the application do interact with the file system in ways that are as complex. (Aside: I wonder how subversion and git do their testing...surely they have similar issues?)

So the root has to be a real root on a real computer and I needed an assortment of real files to be searched for and selected within that root directory. I see only two ways of safely creating a real root directory for tests that doesn't conflict with any existing directory: (a) run the tests on a clean virtual machine where I can safely control exactly what files already exist (b) generate a random temporary directory on an existing machine (and autodelete it).

The first, a virtual machine, would allow me to make things up entirely, but it would have setup costs that would make it unrealistic for repeated debug-patch cycles. A test suite that requires a virtual machine is also is not very appropriate for CPAN modules. I'd like a solution (or at least a philosophy) that could be used for anything I decide to release on CPAN.

The second option, generating a random root directory, can be easily and quickly set up, but runs the risk of calculation errors. It means that I have to concatenate the root to one or more made up relative paths. Concatenating a root with a relative path may be a trival algorithm, but, all the same, it is definitely vulnerable to stupid typos.

Both options have risks and at the end of the day we have a trade off between different risks (non-use of test vs. inaccuracy of test). But if you can think up a way to do the kind of testing I just described using only made-up hard coded values and no virtual machine, I won't object :-).

Best, beth

Replies are listed 'Best First'.
Re^3: How does one avoid tautologies in testing?
by moritz (Cardinal) on Jul 15, 2009 at 21:47 UTC
    The second option, generating a random root directory, can be easily and quickly set up, but runs the risk of calculation errors. It means that I have to concatenate the root to one or more made up relative paths. Concatenating a root with a relative path may be a trival algorithm, but, all the same, it is definitely vulnerable to stupid typos.

    You're right there, but your options aren't totally exhausted here.

    • Rely on an existing, well tested solution to do the concatenation for you
    • you might be able to chdir into the newly created root dir and use only relative paths
    • Maybe your API has an option to hand back the fully qualified paths. If that's the case, you can for example check if the file exists (-e), has some content (-s > 0), maybe even has the right magic numbers etc.
    • Another option if you get back the fully qualified paths is to check if the made-up relative part of the path is a substring of the absolute path
    • Again another option is to recursively search through your root, looking for files of the right name, or maybe just looking at the total number of files before and after a create operation.

    Yes, you give up some precision in testing, but if you do fear the tautologies that might be a worthy trade-off. Or not, depending on your situation of course.

    (Aside: I wonder how subversion and git do their testing...surely they have similar issues?)

    SVN has different storage backends, and I guess you can pretty much black-box test them by using their API. You pump in lots of new revisions of entirely made-up data, and look if you can later retrieve it. Nobody really cares in which file the backend puts which data.

    I guess with git it's pretty much the same: when you add a commit, there's no guarantee in which packfile it will end up; things like git-repack will change that anyway. The important thing is just the files on disk after the write still conform to their file format, and that it can be retrieved again.