Re: Hierarchy of code reuse by flexibility
by moritz (Cardinal) on Feb 17, 2009 at 10:34 UTC
|
| [reply] |
|
|
Thanks for joining the discussion. One note - everybody knows that 'cut and paste' is evil - but yet there is a case where everyone would agree that it is not - on the contrary - it is the way to go for short examples like the synopsis in most CPAN modules. The situation with code generation is in a way similar. What I am looking for is a 'unified' view of all those cases.
| [reply] |
|
|
Please note that the synopsis sections of CPAN modules are not code - they are documentation that has the form of code.
They only become code if you use something like Pod::Snippets to actually turn it into code.
And as useful as they are, I found that they already are a maintenance burden, and I'd prefer a non-copy-and-paste way to assemble them, if possible. (So far I haven't found one, but I also admit that I didn't look all that close).
| [reply] |
|
|
|
|
Thanks for joining the discussion. One note - everybody knows that 'cut and paste' is evil - but yet there is a case where everyone would agree that it is not - on the contrary - it is the way to go for short examples like the synopsis in most CPAN modules.
Well, any sort of code template is essentially automated cut-and-paste. If you run h2xs or Module::Starter, you'll get an outline of code to start working with... and myself, I used to use some emacs code to generate perl accessors as needed.
The distinguishing feature of cut-and-paste vs. code abstraction is that with C&P you get a starting point that can be mutated at will without fear of effecting any other uses. The advantage of code abstraction is that if the interface is well designed, and well understood, you can often fix bugs by changing just one place in the code... but you'd better have good tests and/or QA, or you might have accidentally broken a use case without realizing it.
C&P has a bad reputation because it's too easy to do, and beginning programmers often produce large piles of C&P'd
crap that's exceedingly difficult to read, let alone maintain.
| [reply] |
|
|
|
|
Re: Hierarchy of code reuse by flexibility
by ELISHEVA (Prior) on Feb 17, 2009 at 13:25 UTC
|
Although I often find generators useful, code generation, at least the template sort, is little more than automated cut and paste. I don't consider either a very flexible mechanism at all.
- modifications are not unrestrained. A template driven code generator can only substitute text where the template says it can. To modify the generated code beyond what is allowed by the templates, you have to revise the generator.
- generated code can become a maintenance nightmare unless your generator is kind enough to tag its generated code with comments indicating versions of both the data source and the generator software. Otherwise, you have nothing to go on but date comparisons. However, date comparisons are only helpful if you have customized your build process to be aware of all your generators, often a lot of work. And even a customized build system won't help if you have no way of comparing the version of the current generator to the version that produced the generated code. Code generated by old and new versions of a generator aren't always compatible with each other or the current project requirements. But note: These issues largely disappear if code is always generated from data/source files as described by Tanktulus below. In that case, it is no riskier than generating .o files from .c files.
- it can become even more of a maintenance problem if the generation process is incomplete and the generated code is hand modified. Unless the generator makes advanced provision for hand modifications, the generated code can't be regenerated. If the original generation process spewed out buggy code, you may be faced with hundreds or even thousands of lines of generated code that has to be found and hand modified.
The most flexible mechanism is well designed code that uses the right mix of programming paradigms (OO,procedural, functional, etc). Getting the right encapsulations and abstractions isn't always easy, but when it works, it really works and can grow and change quickly with your requirements. Solid design is more flexible because:
- it avoids replication of bugs - all code is located in one place rather than copied all over.
- it is easily composible
- if bugs are found, the code can be fixed internally without needing to revise consumer code.
- manual modifications can be made via subclassing or custom hooks, making it possible to regenerate even after
extensive manual customization
- support for polymorphism (via objects, dispatch tables, or switch statements) can be built in so that the code changes behavior based on its environment (operating system, server loads, end user preferences, etc)
So, I would actually put your list in reverse order:
- Well designed libraries - capable of real time adaptation and localized modifications that do not affect manual customizations and consumer code.
- Specific language constructs, e.g. switch statements, callbacks, inheritance - allows adaptive behavior for a narrow range of functionality
- Code generation and cut and paste - one off static modifications.
Best, beth
Update: revised comments on maintenence+build process to incorporate Tanktulus's excellent description of "safe" code generation practices. | [reply] |
|
|
Having written some code generation in perl, generating C, Java, shell, among others, all from the same source data, I'm going to say that code generation and cut&paste don't belong together. Mind you, that may be because I have a fully-automated code generator (i.e., runs in the nightly builds) rather than a one-off code generation where we check in the generated code rather than the code generator.
Making changes to one output instead of another is trivial. Adding data to the input is trivial (we store our input data in an XML-like format). If there's a different type of data that needs to be fed through from input to output, well, that depends on the flexibility of the code. In my case, I'd have to add probably about 3-6 lines of code to handle it one new type of data, because the underlying code base is simply that flexible. Tagging with comments is not needed - it's built every night. Getting the build system to recognise it is easy: make already can take arbitrary commands as a way of generating intermediate files (i.e., "C" code) - the fun part is labelling all the pre-reqs so make knows when to rebuild it.
Oh, and hand modification of generated code is verbotten. Absolutely. If it's not a checked-in file, it can't be modified. Ever.
| [reply] |
|
|
I absolutely agree that code generation bears a lot of risks. Particularly in a language like Perl that is not well designed for it.
However I emphatically disagree about the potential returns. I suggested in the past that you read On Lisp, and I'm going to repeat it for the demonstrations of how powerful code generation can be. That said, code generation is a sledgehammer that should be swung carefully and precisely, if at all. (Personally at the moment I do lots of code generation of SQL, but none of Perl.)
| [reply] |
|
|
| [reply] |
|
|
Hmm - I think I understand what you mean - and I mostly agree - but still I would say that I can make more changes in the code that is copied or generated then in libraries. Maybe I should have used some more precise words, maybe I should have called it 'modifiabillity' or something. But your reverse order really brings a more interesting question: if cut and paste and generation are really the worst of all techniques in this 'flexibility' measure - then why at all are they useful? Actually this hierarchy post was for me just a way of making sense of why and when to use code generation, cut and paste and libraries :)
| [reply] |