Everyone "knows" that if you use a code generator to generate your "source" code, then you should always include the code generation step as part of the Makefile. For example:
derived.c: derived.c.pl $(MODULES.pm) perl derived.c.pl > $@
However, after using this style for many years, I’ve come to realize that it has many failings. One problem is explosion of change (example: a generated global header file). The extreme case is that a refactoring within a code generator requires the entire project to be recompiled: even though, by definition, the refactoring does not change the contents of any "source" file. This effect can be mitigated by clever makefiles; but it is not ideal.

Another issue, possibly a bigger problem, is that the use of code generators inhibit refactoring of the generated code. A common sentiment is “its like the output of a compiler, we don’t care what it looks like”, which quickly leads to: “it’s a mess: I don’t understand it, I’d better not touch it”; which eventually becomes “The code generator doesn’t support this feature: so lets not do it”: a code generator should not inhibit the creativity of its users.

I believe that there is a better approach: use the code generator as a test. Here’s how it works:

tests: source.c.diff source.c.diff: source.c.gen source.c $(DIFF) $^ > $@ || touch source.c.gen # rerun next time if fail source.c.gen: source.c.pl $(MODULES.pm) perl source.c.pl > $@ % make tests … % make tests DIFF=gvimdiff
The first part of this is quite obvious: we create the generated code (as before); but we use it only as a test. This encourages people to think of the source code as real code: you make it readable, and you check it into the source control system.

The second part is a bit more subtle, but is where you leverage the power of the code generator. The code generator becomes a partner: suggesting changes, but not forcing you to make them. If the diff fails, then you can use your favorite merge tool to accept/reject individual changes. Or, if you’re feeling brave:

% make tests DIFF=cp
The point of all this is to free programmers from the tyranny of the code generator. It becomes possible to make a simple change to the C code, without having to fix the code generator immediately. You can still build the software, and run it. Sure, a test fails; but you can postpone fixing it for a few hours if you want. There is no worry that the code generator will suddenly come along and splat your changes.

Another advantage of the DIFF approach is that it becomes easier to maintain the code generator. If you are refactoring the code generator, then the DIFF is a perfect test (it defines exactly the required behavior). If you change the C code, then you are able to easily practice test-first programming: the DIFF tells you what needs to change in the code generator. You don't have to rebuild the exe for every minor change!

You can get some of these benefits with clever makefiles using a generate-in-build flow; but I don’t think there are significant benefits for doing so. There are many benefits of separating the generation and build flows (coupling them only in the tests), and few disadvantages.

--Dave.

Replies are listed 'Best First'.
Re: Code Generators as tests.
by lachoy (Parson) on Aug 19, 2002 at 20:48 UTC

    This is a very interesting subject, and I'm always surprised that Perlfolk don't deal with this more often since Perl has great text manipulation built-in and available world class templating systems.

    That said, I strongly disagree with keeping generated code in the source control system. In my experience this winds up getting out of sync with the actual state of the project. Developers should be aware that changes are being made to the generated code, and the best way to make them aware of this is when they need to compile additional files whenever they build the project. Smart code generators will not regenerate a file if it has the same content as one that already exists, which will lessen or eliminate the "having to recompile" everything problem.

    I think the other conditions you speak of here indicate a code generation system needing repair. (I hesitate to say "broken" because it clearly functions, but suboptimally.) If people are treating the code generator as a black box then they need to understand the parameters for changing the generated code: how easy is it, what are typical changes, etc.

    I'm ambivalent about whether all developers need to know everything about the code generation system. It would be a great situation if they did and prevent the 'guru getting hit by a bus' problem, but most often there's simply not enough time.

    As usual: according to my experience, with my two cents, etc.

    Chris
    M-x auto-bs-mode

      Until fairly recently, I'd have agreed with you. However, I think it is human nature that is broken, not my code generation environments :-). In my experence, once you start programming with meta data, the meta deta evolves to become turing complete. People want to have conditionals (and later, iteration) within the abstraction that they are concentrating on. And "people" includes me!

      Smart generators may be able to avoid regenerating files that don't change; but many versions of "make" are dumb: they see the dependency in the DAG and follow it all the way up: they don't check to see if the output really changed at a given step.

      I am not really suggesting that you put generated code in the source control system: I'm suggesting that you put source code in it. The fact that you used an automated partner to help you write that code is unimportant.

      I am very aware of the the tendency for the checked-in and generated codes to diverge: that's what the tests are for. They fail when the two are different. At that point, there's a choice: either fix the test (i.e. the generator), or throw it away (the generator was just a wizard). --Dave