http://qs1969.pair.com?node_id=490233


in reply to Re: Documenting non-public OO components
in thread Documenting non-public OO components

I assume that you refer to the technique of documenting the functionality of a module indirectly by providing tests which double as example code, labeled by a message which indicates what the test is for. That is certainly useful as a supplement, but not as a substitute for the summary of functionality that typically appears in POD or javadocs.

If you were trying to grok the flow of data through several classes, and through perhaps 10 or 20 methods, would you prefer dredging the intended use for each method out of the test code over consulting a purpose-built summary? I think you would have a hard time achieving the level of confidence required by the second test in your sig. Bad software! Bad! :)

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com
  • Comment on Re^2: Documenting non-public OO components

Replies are listed 'Best First'.
Re^3: Documenting non-public OO components
by dragonchild (Archbishop) on Sep 08, 2005 at 16:19 UTC
    If you were trying to grok the flow of data through several classes, and through perhaps 10 or 20 methods, . . .

    Lord and Lady preserve us! How complicated are you making your code?! If I have to do what you're suggesting, that is a CodeSmell and needs to be addressed immediately, preferably through refactoring the living #$E*! out of it. I should be able to look at a test and see exactly how I'm supposed to use any portion of your code. Period, end of story. Otherwise, either your tests or your design sucks. (Or both, but if that's the case, then you're better off starting from scratch.)

    Remember - we're talking about the non-public portions of your API. This is the stuff that you don't want the average client using. Since the intended audience is a developer, they should be looking through the tests, anyways, to understand the assumptions you've made that your code won't document. In addition, they probably are looking through your tests in order to figure out how to test their modifications to or subclass of your stuff.

    Furthermore, I'll point at DRY (Don't Repeat Yourself). A purpose-built summary is, by definition, a repetition. It has to be kept in sync by hand and thus, by definition, won't be in sync. The tests, on the other hand, are part and parcel of the code in question. If the test suite passes (as it always should), then I know that the tests are the best possible form of documentation the developer could have provided. (They're even better than any comments that might be in the code.)

    While we're on the topic, the documentation for the public API should consist of the following:

    • What is this thing (NAME)
    • Basic usage (SYNOPSIS)
    • Ideas behind it (DESCRIPTION)
    • Details about the names of the methods and the various options (METHODS)
    • Common uses (EXAMPLES)
    • Anything else the client should know (BUGS/CAVEATS)
    • How to get more help (CONTACTS/AUTHORS)

    No more and no less. (Any similarity to the standard POD skeleton is completely intentional.) At no time should any reference be made to implementation details, save when one public method calls another public method within the same module (and that should be done sparingly). Private methods should never appear, save when they violate some community contract (such as having "new" or "clone" be a private method) or some Perlism (such as having "print" be a private method). And, frankly, if you have to document a private method in your public API, that's a DesignSmell and should be addressed ASAP.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      Lord and Lady preserve us! How complicated are you making your code?!

      To get a sense of the magnitude of the project, please peruse http://lucene.apache.org/java/docs/api/index.html.

      If I have to do what you're suggesting, that is a CodeSmell and needs to be addressed immediately, preferably through refactoring the living #$E*! out of it.

      You may wish to direct your critiques to the Lucene developer's list, though I'm not sure how they will be received unless you refactor out the condescension. ;)

      I've written a full-featured pure-Perl search engine (Kinosearch), and while the class-level architecture is less complex than Lucene's, that comes at the price of having more functionality inlined than it should. Building an inverse index is just complicated, period. Either you have to write functions which are so complex that they ought to be refactored (Kinosearch) and Dragonchild presumably disapproves, or you factor out the functionality so that data passes through several classes and 10 or 20 methods (Lucene) and Dragonchild definitely disapproves. Catch-22. :)

      I gather from your comments that that you are a stong proponent of test-driven development. FWIW, I appreciate this school and try to adhere to the principles when possible, but with all due respect, I don't consider it the One True Way.

      Period, end of story... [snip] No more and no less... [snip] At no time... [snip] ... never... [snip]

      Hmm. This is awkward. I hope we can avoid descending into a religious argument. There are certainly things I can learn from our discussion, but as a Utilitarian by disposition, I find these absolutes off-putting. Some Java developers start by writing javadocs first, an approach which is somewhat similar to writing tests first, in that you don't just start out by writing code. It violates DRY, and it doesn't follow the pattern of "only write code when you have a failing test", but in both cases, the developer is thinking about the high-level interface first. IMO, developers who work this way are not terrorists who hate our freedom and are out to destroy our way of life. Good software which does useful stuff can be written in many ways.

      I had an interesting chat with Ian Langworth, co-author of "Perl Testing - A Developer's Notebook" at OSCON. I asked him, How do you write tests for an inverse-indexing app? Absolutely, you have to have high-level tests which consist of queries against the index. But the process of building the index is quite complicated, and arguably every link in the chain is an implementation detail. How do you avoid tests for the intermediate processes which rely on the innards of what ought to be a series of black boxes? The answer is... there's no easy answer. Mock objects help. But there's going to be a fair amount of waste... It was a very helpful interchange, and my tests are better for it.

      In conclusion... Your comments are appreciated, but the recommendations are inappropriate for my current task, which is porting an existing library not designed from the ground up according to the principles you set out. For the time being, I intend to convert Lucene's excellent javadocs to POD, rather than delete them and replace them with tests. But even if I were to start over and write another search engine library from scratch, I think adopting a dogmatic, absolutist TDD workflow would be unnecessarily ascetic and would yield documentation both excessively voluminous and noisy (in the information-theory sense). Better to adopt bits and pieces of TDD, specifically the emphasis on maintaining a comprehensive test suite, while supplementing the code with documentation via tests, comments and summaries.

      Regards,

      --
      Marvin Humphrey
      Rectangular Research ― http://www.rectangular.com
        Building an inverse index is just complicated, period. Either you have to write functions which are so complex that they ought to be refactored (Kinosearch) and Dragonchild presumably disapproves, or you factor out the functionality so that data passes through several classes and 10 or 20 methods (Lucene) and Dragonchild definitely disapproves. Catch-22. :)

        Many tasks are just plain complicated on their face. Writing an SQL generator that takes into account arbitrary schemas, arbitrary constraints, and selective denormalization, then builds the correct optimized SQL is hard. It is correctly broken out into functional areas. One of those areas is the use of a directed acyclic graph (DAG) to represent the schema. I certainly didn't write that code (though I ended up rewriting it to suit my needs). But, that was a conceptual black-box interface. Although I know nothing about inverse indices, I'm pretty sure that, like all other CS problems, it decomposes quite nicely into areas that are generalizable. Anything that's generalizable is a conceptual interface that is another distribution.

        Yes, your data is going to pass through different distros, and that's ok. The big thing to focus on is the idea of separation of concerns. My SQL generator didn't need to know how a DAG worked, just that if you hit the red button while pulling the green lever, the blue light will come on. I suspect that there's a lot of stuff on CPAN you can certainly reuse, reducing your coding (and testing) burden.

        IMO, developers who work this way are not terrorists who hate our freedom and are out to destroy our way of life. Good software which does useful stuff can be written in many ways.

        Absolutely (*grins*) true on both points. However, read my signature. Good software can come from many places, but it has two very basic criteria - it works and someone else can safely modify it. If it doesn't meet those two things, it isn't good software. And, frankly, that is an absolute.

        Now, how do we meet these criteria? Well, the most efficient way (thus far) is TDD. How do you do TDD with a complex system? By mocking up your interfaces. You test your intermediate items by mocking up their dependencies. Then, you have some system-level tests which exercise the system as a whole without mocks, and you're 80-90% of the way there.

        How do you avoid tests for the intermediate processes which rely on the innards of what ought to be a series of black boxes? The answer is... there's no easy answer. Mock objects help. But there's going to be a fair amount of waste...

        Waste? I don't see waste. Yes, you will have to keep your mock objects current with the spec of your intermediate sections. However, while specs may grow, and quickly at times, they should change items very slowly. Otherwise, it's an incompatible interface change which should happen either with a new major version or in alpha software. Anything else is churn which screws you up no matter what paradigm(s) you're using.

        . . . the recommendations are inappropriate for my current task, which is porting an existing library not designed from the ground up according to the principles you set out.

        You're rewriting the library from scratch, period. You're using a different language and you have access to different libraries and language features. You may be preserving the same API and functionality, but it's still a complete rewrite. Porting the docs is all well and good, but you still need tests written against the spec that your code will first fail against, then pass as you write the minimum necessary. I have no experience with Lucene, but I am 100% positive that there is cruft in that codebase. By rewriting against the spec using the existing codebase as a reference, you most certainly can use TDD. In addition, you can probably end up with several new distros to add to CPAN that aren't directly usable solely for reverse indexing.

        I'm doing a very similar project in Javascript, porting the Prototype library to JSAN. Instead of just throwing it up there, I'm converting the innards to JSAN distributions, cleaning them up and renaming them. I'm also keeping a compatibility layer so that existing users of Prototype (such as Ruby-on-Rails and Catalyst) can convert over to JSAN with little change in their existing codebase while taking advantage of the better code. I suspect you will find that you can do the same.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?