gri6507 has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks,

months ago I did a toy project to explore one automation consept for our work environment. The idea is that we had an archaic tool that took a "batch" file and executed it. Problem was that no two jobs required the same batch files and manually editing those files was error prone. So, I created a Tk GUI to automate this process. In a zealous swing, I extended this tool to also execute that batch file on the hardware and parse the resulting output. As of right now, the program is over 2000 lines of code.

It didn't take management long to see the usefulness of this tool and the advantages it had over the original way. This of course, is great! However, as I stated earlier, this was just an exercise, and as such, new functionalities were added piece by piece to the already long program. I know that soon it will be decided to "productize" my tool for our customers, yet the program is just a fix-on-top-of-a-hack style. So I am asking for your guidance on how to better organize it.

I found one reference on PerlMonks to this question 62740, but it doesn't actually answer any questions. I think what I would like to do is break my program up into modules, but I can't seem to figure out how to do it cleanly. It is a GUI tool with added threads, socketted communication, and intra-tool communication links (bound tags) dynamically activated based on results, using plenty of global variables. Given this mess, can you recommend some reading to help me better transform the tool from its current version to a maintainable version. Thanks in advance.

Replies are listed 'Best First'.
Re: reworking a large program
by Zaxo (Archbishop) on Feb 09, 2004 at 19:34 UTC

    This will be painful, but I think the best start will be to write documentation and tests. That should expose whatever unities may have survived what sounds like a normally haphazard development.

    You probably have three major independent chunks. The GUI/event mechanism is probably easy to encapsulate, since it depends on Tk and has a clear interface in the event handlers and dialogs (or whatever). The second and perhaps third chunk is the control mechanism in the "batch file" generating and parsing. The other third is the interfaces to external tools you call.

    That leaves out utility modules that you will probably need. You say there are lots of global variables. Trying to limit those to smaller scopes will go a long way towards exposing closely related code. Those that are truly global become part of your configuration module.

    I hope you're granted time to do this right. Have fun!

    After Compline,
    Zaxo

Re: reworking a large program
by Fletch (Bishop) on Feb 09, 2004 at 19:12 UTC

    Get a copy of Refactoring by Fowler (ISBN ISBN 0201485672). It's geared more towards Java or C++, but the general principles are still applicable.

      I find Fowler's process to be highly-mechanized. I took some courses that were sub'bed for a few weeks by one of his fellow disciples (hardcore into eXtreme Programming too), and have been to one of Fowler's guest lectures. It seemed rather obvious, but also mechanical in a bad way. He tends to propose doing things very incrementally, and uses the compiler too often to double check that his code is valid. He doesn't state general concepts insofar as much as he wants to shorten L.O.C.

      I am a huge proponent of refactoring code (and do so constantly), but Fowler gets too much credit, IMHO, for spearheading the effort.

      One of the scarier attributes of Fowler's style (and his disciples) is to decompose methods until they are as small as possible. What was the stat? Methods averaging 2-4 lines of code average? That's rather inefficient due to system-call overhead and can often go too far. I think that also makes debugging rough. Depending on the method, 2-4 may be just right, but this is not to say 10 lines is overkill.

      His disciple, for instance, had a holy war against using the "if" statement, and wrote entire programs using trinaries and anonymous inner classes to avoid the if. Ack! Run away!

      I haven't read the book, but his 2 hours of lecture (which had the goal of selling the book) kept me from having any interest in the book (or at least paying for the dead trees). I'm very much into clean OO, but Fowler takes clean and somehow makes it feel dirty.

      The above post by saberwolf pretty much sums up my opinion of the lecture. It was like a lecture coding for people that should have never been coders.

      Aside: My favorite engineering professor ever (not C.S., but that doesn't matter), once had a very interesting "how people learn" lecture, where he stipulated that C.S. folks, more so than other engineers, tend to think globally rather than iteratively. It's sort of this "a ha!" process rather than something mathematical like a proof. This kind of thinking makes us good at what we do (analyzing huge systems without getting bogged down by details), and to me, I can't learn from Fowler's iterative mechanical steps. Give me basic concepts, no instructions, and I like bullet points and lots of pictures. Thanks, Dr. Porter :)

      I have just finished reading this book. I bought it because I have just inherited a mountain of shoddy perl code and I was hoping it could help.

      I found it to be the biggest waste of my time ever. The author has a tendency to "talk down" to the reader, and furthermore, spends most of his time bragging about himself and his friends. I found the writing style to be very annoying.

      Once I got past that and into the "meat" of his "refactorings," it seemed like all he wanted to do was place labels on things that are (or should be?) common sense. He refers quite a bit to various "patterns" which of course can be found in other books. It seems like he wants his "refactorings" to be remembered as people remember the patters from these other books. Unfortunately, while I did find a lot of the patterns useful (or at least interesting), none of his refactorings seemed to have any basis in reality, or if they did, they were so blatantly obvious that it seems pointless to devote 6 pages to it.

      For example, many of his refactorings are things like "move method" or "move field" - and of course he spends pages and pages discussing how to move a method. COME ON.

      Of course there are probably a few meaningful and useful things in this book. His insistence that you edit in small steps and test regularly to make sure things keep working is of course a great thing to do.

      I think something more useful would be to think of how you would design the program if you were starting from scratch. Thus, read up on program design and separation of business logic from presentation and object design and what have you. Once you have an idea of how you would program it were you to start from scratch, start examining what you have so far and think of what small steps you could take to bring your current "design" in line with your ideal design.

      Ditto to Fletch. There is also a nice article from Perl.com here.

      Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
        This article seems very useful. Moveover, the concept of Refactoring is quite novel, but I'd think its usefulness decreases with the size of the program while the time spent on refactoring increases, in, what I would guess, an exponencial manner.
Re: reworking a large program
by waswas-fng (Curate) on Feb 09, 2004 at 19:34 UTC
    I would approach management with the goal of defining a project with scope and requirements for this app, let them know that this first run was a "proof of concept" (if you have not made that point already) and that the application needs to be rewritten/restructured to be supportable/usable in the wild. After management approves the project, then sit down and re-factor -- my gut feeling on things like this is that if it was not meant to be production as it was written, it is usually easier to rewrite from scratch to bring it up to production quality.


    -Waswas
Re: reworking a large program
by mvc (Scribe) on Feb 09, 2004 at 20:17 UTC

    It is good that your manager gave you refactoring resources. Some managers do not understand the wisdom of keeping quality constant and high, sometimes even paying for someone to remove code from their software.

    See this link for some content on escaping from the big ball of mud.

Re: reworking a large program
by graff (Chancellor) on Feb 10, 2004 at 05:03 UTC
    ... no two jobs required the same batch files and manually editing those files was error prone. So, I created a Tk GUI to automate this process.
    Just a terminological nit-pick: a GUI does not automate a process -- if it is well-designed, it optimizes the interactive part of the process, so the user's task is reduced to the fewest, easiest, most intelligible inputs and decisions, making efficient use of human abilities in a job that would be too hard to "automate".

    As you approach the rewrite, examine the user's experience with the current tool: Are there parts of the task that seem unnecessarily repetitive or redundant? Are there parts that involve irregular changes among distinct sub-tasks (as opposed to having similar sub-tasks grouped together)? What parts are still error-prone for the user, and what sorts of validation/sanity-check are provided to locate and correct likely errors (automatically or on demand)?

    The point of this would be to start coding a new version from "first principles": what does the user really need to do here, and what would be the simplest, most effective way to enable that?

    In a zealous swing, I extended this tool to also execute that batch file on the hardware and parse the resulting output. As of right now, the program is over 2000 lines of code.
    Think of the output parsing as a separate process. You can tie it to the "batch-prep" step as tightly or loosely as you like, but write it as a separate piece of code. If it involves knowledge that is also used in "batch-prep", then that shared knowledge is the basis for your new modules -- you need to write the code to encapsulate that knowledge just once, to objectify it. (The "run-batch" step can be the last part of the "prep" or the first part of the "output parse" -- or both, if both the prep and the parse have distinct controls that might motivate a fresh run to test a different set of choices.)
    It is a GUI tool with added threads, socketted communication, and intra-tool communication links (bound tags) dynamically activated based on results, using plenty of global variables. Given this mess, can you recommend some reading...
    Could some of this complexity have arisen from the "exploratory" or "toy-project" nature of the initial development? This might be another reason to re-examine the user's perspective and re-formulate your "first principles"; if it's a basic "configure -> run -> assess" sort of task, perhaps you don't need threads, or perhaps the inter-/intra-process communications can be reduced to a simple pipeline model. If you're not using a relational database, would it help to use one (in case it makes the program simpler to code and/or simpler to use)?

    Anyway, I don't think I'd recommend reading as much as writing... How long would it take to describe the relevant facts about the batch file format and the output of the batch process? How would you document the range of controls and inputs that the user needs to handle when prepping the batch file? How hard would it be to scope out and describe what the user needs to do with the output? If you write that up first, and derive clear data structures to capture it all, then organizing and writing the actual code will seem to follow naturally from that.

Re: reworking a large program
by artist (Parson) on Feb 09, 2004 at 19:37 UTC
    Given that /you/ have done all your work, write architecture of your intended product. Define input/ouputs, flow of data and functionality of your functions/programs properly. Thus spend sometime in design work and you will get your rewards.
      I couldn't agree more with the above post. Plan, plan, plan.

      I've just begun reworking a large-ish system that I didn't write. I wish I were in your shoes instead of mine in this case because I first have to go in and dig through a ton of (poorly written) code to find out what it does. You at least have the advantage of knowing this already.

      When you do get around to writing your new code, though, please take into consideration that next guy who will have to rework your code. Document everything and use coding techniques that are easy to follow.

      And most importantly, Good Luck! Your ideas for the new app sound intriguing. Makes me want to go write some code myself. :)
Re: reworking a large program
by demerphq (Chancellor) on Feb 15, 2004 at 22:29 UTC

    Refactoring is an ongoing practice. You should be constantly doing it every time you code. Whenever you end up with three routines that do basically the same thing they should get replaced. (Perl offers a wide reange of ways to do this intelligently, from code generation, to closures, to complex argument handling schemes.) You shouldn't have any global variables. Either encapsulate that data into an object, or pass it as required as parameters.

    Without seeing your code in depth, and the problem space it operates in we cant suggest anything solid. Its a skill you learn, where to put data and routines, what modularizes nicely, what doesnt. Expect to get it wrong a few times. Its not uncommon for me to rewrite from scratch the same body of code two or three or even four times before I'm happy with the design.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi