Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

(OT) Generated Code vs. Libraries

by Mutant (Priest)
on Oct 21, 2004 at 12:40 UTC ( [id://401139]=perlquestion: print w/replies, xml ) Need Help??

Mutant has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks.... I hope this is not too off-topic, but it does relate to Perl applications.

I've just joined a company that has an application that generates a bunch of CGI scripts based on certain values. These form the 'backend' of a project, and this has become quite critical for the company. I've managed to convince them that the existing code is unmaintainable, and am in the process of doing a complete re-write.

My problem is, I'm having difficulty deciding whether I should stick to the CGI Generation method, or to make it all run out of one CGI (or mod_perl handler) with a config read for each individual project. I guess the pros and cons of each are fairly self explanatory, but to state the obvious, code-generation makes it much more difficult to fix bugs in previously created projects, while pushing everything into a common library runs the risk of everything breaking when a bug is fixed (and we're talking hundreds of projects, the continued up time of all of which is critical to the company).

Although I probably naturally lean towards running it all from a library (and putting a sound release procedure in place), there will almost definitely have to be an option to generate the CGI, since a few projects need customisation which will likely be beyond the scope of my generator. Besides that, the fact that the existing system (which sort of works) does things this way, means that it's a greater risk for me personally (ie. it could be seen of a case of fixing something that wasn't broken if it all goes wrong).

Anyway, any wisdom you can share on this problem will be greatly appreciated :)

Replies are listed 'Best First'.
Re: (OT) Generated Code vs. Libraries
by jordanh (Chaplain) on Oct 21, 2004 at 13:45 UTC
    Without knowing more about this, I'd say your biggest challenge here is what's known as design recovery. It doesn't sound like you have a lot of design docs or requirements beyond what is implicit in the system that "sort of works". You're going to have to recover the design from this implementation.

    I would suggest not doing anything too drastic, not right away. I would attempt to build up your common library that does as much of the common functionality of the generated CGIs and migrate the generator to use these modules in the generated CGIs. Only when you have the code factored in such a way will you be able to see the true requirements.

    You'll need a good test bed and the ability to migrate individual projects over to your new common libraries to ensure you don't break anything. Set up your development and test environment first. You need a dev environment where you build your new modules and a separate test environment where you field your changes in an environment as close to 'real world' as you can. If you can't afford building up this kind of infrastructure, perhaps you can't afford to rework this system.

    If you can't afford to rework the system, then I'd propose that you spend all the time trying to understand and document what you've got so that you can maintain and debug it as it is currently structured.

    Be very clear in your goals. I believe you when you say it's a mess, but have clear ideas about what you hope to improve. Is extending it too inflexible? Performance poor? Remember that your various goals might be cross-purposes to one another. You might increase flexibility and decrease performance.

    You stated that debugging is too difficult, which suggests comprehensibility is a big problem, which is why I suggest the approach of improving it slowly with stepwise refinements. Don't be afraid of building up something that you'll eventually throw away in favor of a new architecture from the ground up. You might be building up these modules only for the purpose of understanding the system better so that you can intelligently rebuild it.

    Updated: Grammar and spelling.

      Jordanh is right. A test bed is critical.
Re: (OT) Generated Code vs. Libraries
by dragonchild (Archbishop) on Oct 21, 2004 at 13:31 UTC
    While I have my thoughts on what I'd do if the project was just beginning ... remember the all-important adage:

    If it ain't broke, don't fix it!

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: (OT) Generated Code vs. Libraries
by ggg (Scribe) on Oct 21, 2004 at 15:03 UTC
    Remember the Physician's Code: "Do no Harm".

    For the sake of the company and your job/reputation, take a conservative approach. As the new guy, you want to have the result be noticeably better without causing any problems.

    ggg
re: Generated Code vs. Libraries
by bibliophile (Prior) on Oct 21, 2004 at 14:04 UTC
    I'm kind of in the same boat... but it's my own damn fault :-) Our system has grown cruftily over the years (I demo a prototype of a cool new feature, but then it gets rushed into production without proper integration).

    I'm slowly converting to the "common library" theme... but it's a real pain. It will be worth it, though (or so I keep telling myself).

    I'm actually going with a few libraries. The "common" common stuff that everything needs, and separate libs that distinct groups need.

    Update:Changed dorky title to something relevant.

Re: (OT) Generated Code vs. Libraries
by perrin (Chancellor) on Oct 21, 2004 at 18:14 UTC
    Putting aside the specific situation and politics you are dealing with, in general, code generation is a sign that you are doing something wrong. Usually it means you are not abstracting things correctly or are trying to do the compiler's job. There is some discussion of it here. Note that I don't count run-time code generation like the various method-maker modules to be the same thing, since there is no permanent code created from it.

      I completely disagree. Code generation is a sign that you've found a way to get the computer to do something tedious and repetitive for you, thus saving valuable programmer time and eliminating possible human error.

      Some examples:

      MS decided not to provide any reliable way to get stub error messages for socket errors. You can use FormatMessage() with GetLastError() (like how strerror(errno) works on Unix), but not with Winsock's WSAGetLastError(), for some reason or another. So I went to the page documenting the error codes returned by WSAGetLastError() and found each code had a short description beside it. I wrote a little Perl script that used LWP to fetch that page from the MSDN website, parse the HTML and extract each error code and error description, then generate C code for a lookup table of error strings. Problem solved.

      I have a subsystem in a C library of mine that basically just encapsulates a few common structs, providing New() constructor routines, Get() and Set() accessor routines, and Destroy() destructor routines. The documentation for these routines would be quite predictable, so I wrote a Perl script that parsed the header file (which itself was automatically generated from the source file) and generated POD documentation for each routine. Now I only need to add a few additional, routine-specific bits of information here and there and it's done. Problem solved.

      I wanted an XSUB interface to some C code of mine. I wrote a code generator that generated the XSUBs for me along with some special functions and CODE: and PPCODE: sections to do some neat stuff way beyond what h2xs (another code generator) is capable of producing. Time saved, problem solved. (xsubpp itself is a code generator, by the way.)

      You'll probably say that I shouldn't have had to generate that lookup table, that MS should have made FormatMessage() work with WSAGetLastError(), or that C should be C++ and make encapsulating abstract data types easier, that XS ought to be more flexible, or whatever. The fact is that we don't live in a perfect world with perfect software. Stuff we use often times doesn't work or doesn't work very well, and we as developers find ourselves having to pick up the pieces.

      Code generation can help with that, doing the repetitive stuff for you and saving some serious time in the process.

      See The Pragmatic Programmer, page 102, for a discussion of this.

        I think you're missing the point. The idea is simply that anything you can do via code generation could be done using subroutines instead, provided you are using a high-level dynamic language like Perl. In addition, your examples are mostly not what I would call code generation. Generating a lookup table from an HTML page is basically data manipulation and could be done as a config file rather than code. Generating documentation is, well, documentation. It's human-readable text, so you can't handle it as a library call the way you could with other code generation situations. I don't know enough about XSUB to comment on your last example, except to say that the rules are different in static languages like C where you really may not be able to do certain things as a subroutine.

        The canonical example for this discussion is generating a set of classes for manipulating database tables. Class::DBI does this in perl, and it uses code generation, but it does the generation on the fly at run-time and doesn't produce an intermediary source code that can be hand-edited and get out of sync. The use is still questionable in my opinion, but not as bad as it could be.

      I'm with William G. Davis in disagreeing with you on this. Code generation can mean that you've managed to express things in terms of a higher-level abstraction. I don't see the relevance of your distinction between code generated at runtime and that generated earlier.

      It would be perfectly reasonable to invent a new language (whether a generic language or something more domain-specific), and have the implementation of the language compile it to perl code.

      In fact a C compiler does exactly this: it generates an assembler program from the C code. This also I think answers the one issue I recognise from perrin's comment, on the danger of files getting out of sync if you hand-edit the intermediate results - it is a matter of expectation (I don't expect to edit the assembler source that the C compiler produces), reinforced by infrastructure (eg setting the generated files read-only) and protocol ("this is the procedure to change it").

      Hugo

        If you don't see the relevance of the distinction between code generated at runtime (sometimes called "active code generation") and that generated earlier ("passive code generation"), I'm guessing you haven't worked on a large project that used passive code generation. The problem is that there will be an emergency fix and someone will edit one of the generated files because it's much easier than trying to understand and change the code generation code. And then you are in big trouble.

        Anyway, both of you seem to be implying that I said something like "generating anything from anything else is always bad." That's not what I'm saying at all. Let me try to re-state it more clearly:

        In most cases where you could use code generation to solve a problem, you could also use a data structure and some subs to solve it. To quote from the wiki link I posted, "Anything you can do by generating code, I can do by calling data driven subroutines." Using subs is better because it is easier to understand (no need to parse two-levels of code at once in your head) and avoids the danger of hand-editing.

        I'm not claiming that there are no situations at all where code generation is required. A common reason to use active code generation is the performance gain you can sometimes get with it (e.g. templating systems often generate perl code from templates).

        Even if you completely disagree, the wiki link I posted is pretty interesting reading, and makes good points on both sides.

Re: (OT) Generated Code vs. Libraries
by SpanishInquisition (Pilgrim) on Oct 21, 2004 at 15:06 UTC
    My career here has pretty much been sunk by being too smart for my own good. Some people fear change and will resent you for it, and will not want to hear opinions about what is wrong with a codebase -- be careful -- know your audience.

    But if it was me, I'd gut it. Nothing is worse than a codebase you don't like working in and can't change. I know, I live there :)

Re: (OT) Generated Code vs. Libraries
by Mutant (Priest) on Oct 21, 2004 at 16:16 UTC

    Thanks for your comments all. I think I'm going to take the risk and run off libraries. :)

      I'm replying a bit late, but I just wanted to point out something that you seem to have missed. Using "libraries" and "generated code" are not mutually exclusive techniques. It is perfectly feasible to use centralized code generation routines (i.e. generate code from libraries), if you want to keep going down that road. I would avoid it, but it's a distinct possibility.

      Another possibility that you seem to have overlooked is using code libraries, but not One Big Library that is used in every project. You indicate that some projects need customizations. This is still perfectly doable, even with code that is centralized and clean. You can make subclasses of certain libraries, for instance. Or you can make additional libraries to encapsulate the customized behavior. Once again, you seem to have set up two choices that are not actually opposing. You can have your cake and eat it too! :-)

      I hope my late reply still has some usefulness to you...

      Update: bibliophile's earlier post mentions the idea I was getting at:

      I'm actually going with a few libraries. The "common" common stuff that everything needs, and separate libs that distinct groups need.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://401139]
Approved by Arunbear
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-19 11:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found