Over the last year I've built a web application called File Exchange Server for internal and external use using CGI::Application. It's working well, the code is modular and OO, and I even have tests for some of it.

However, some of the modules are getting large, and I'm considering splitting them into sub-modules. As usual, there are pros and cons about doing this.
ProsCons
  • Smaller files -- less scrolling up and down to find stuff
  • Better modularity
  • More files to manage
  • More interfaces -- may need a new Util module for each Object's useful routines
I'm the sole developer, so it's not like I can hold a meeting and take a vote -- the three modules I'm looking at are about 1000 lines long, with the subs averaging 100-120 lines in length.

I've had a quick squint through Perl Best Practices and didn't see anything about this. Thoughts?

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

  • Comment on Is there an ideal module size or subroutine size?

Replies are listed 'Best First'.
Re: Is there an ideal module size or subroutine size?
by McDarren (Abbot) on Aug 14, 2007 at 15:15 UTC
    With regards to the size of individual subroutines, I read somewhere a while back (and it may have been in PBP), that "if any subroutine takes up more than a single screen when viewed in your editor - then it is probably too big".

    Personally, I like that - and I try to use it as a general rule of thumb.

    Cheers,
    Darren :)

      (Hopefully not showing my age too much, but) I originally heard that n years ago as "more than a single page" when printed, but the sentiment survives. If a sub is spread over more than a single "contextual unit" (printed page, screen full) you start to spend more time trying to track where you are in the big picture rather than following the flow of the logic.

      (Of course when you run a 1280x1024 monitor rotated to portrait orientation "a single screen" can get pretty long . . . :)

      if any subroutine takes up more than a single screen when viewed in your editor - then it is probably too big

      For mine ... if any subroutine takes up more than a single screen when viewed in your editor - then you probably need a different editor, or a bigger screen ;-)

      Cheers,
      Rob
Re: Is there an ideal module size or subroutine size? (no,yes)
by tye (Sage) on Aug 14, 2007 at 15:38 UTC

    I've seen useful modules that were really only one line. But, of course, most useful modules are more than that. Some problems demand quite a large module. When part of the module's functionality can be split out into a logically consistent separate module (or even just a separate package), then that is often wise. If you can come up with a good name for this separate piece, then that is a good sign that you've identified a set of features that makes sense to separate (and coming up with a good name is extremely important when factoring out a chunk of features / code).

    So about the only time where size is much of a conern for me as related to modules, is when the implementing of a module calls for several small packages and I wonder if it is better to implement all of the packages in one file or split some or all of them into separate files (and there aren't other more important factors motivating that decision -- usually one package per file is best). Or, rarely, as a "red flag" telling me that I did a poor job of factoring the module's too-large functionality or of "reducing scope" to produce a manageable feature set.

    So, for modules, I think it is much more about the size and cohesiveness of the feature set than it is about number of lines of code. It doesn't take all that much to have several hundred lines of code in a module so I wouldn't consider 1000 lines of code in a module to be a "red flag".

    For subroutines, single-line subroutines are often useful, of course. But here I put a much stronger upper limit on what I consider appropriate size. Subroutines are a basic unit of abstraction so each one needs to be easy to understand as a unit. So I'm very much against overly long subroutines. I prefer subroutines of a couple dozen lines or fewer, so that they fit in a standard terminal window (24x80). In practice, I don't find this metric too difficult to stick to, especially in Perl, but I certainly don't consider it a hard limit. It is easy to justify going over, especially when coding style guidelines demand near-waste of vertical space with unnested opening curly braces, extensive intra-sub comments and vertical whitespace, or (as mine do) don't allow for flow control stuffed into a single line. But I'm very reluctant to leave a subroutine of over about 60 lines.

    If I can't view the entire subroutine at once on a larger-than-I-usually-use window or a page of paper, then it detracts from my ability to quickly and clearly understand it. And, of course, having a good name for each subroutine is very important, though many of those names only need to make sense within the context of their surrounding code.

    There are, of course, cases where it doesn't seem convenient to split subroutines down to that small of a size. And you don't want to get bogged down obsessing over how to meet some arbitrary size limit, especially since not seeing how to do it often means that if you do it anyway you'll do it poorly. So putting it off can be a wise choice (and procrastination often means not having to do it). My experience is that such difficulties usually stem from missing a good factoring angle or technique. For example, making the code driven by data and having the data expand beyond one screen-/page-full is one technique. I don't mind repetetive data spanning pages nearly as much.

    - tye        

      A thoughtful reply. Thank you.

      I mentioned already that this is a CGI::Application program -- the subs are run modes, and each of the subs in the modules are run modes related to users, files and projects. There may be some duplicated code, code that I can refactor, but I don't think I'm at that stage yet. The routines are basically ones that collect information from the CGI activity (GET or POST), do some database calls, and display the results using Template::Toolkit.

      It seems that I'm not in a Bad Place yet .. so I may well just hold off any drastic changes.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Is there an ideal module size or subroutine size?
by roboticus (Chancellor) on Aug 14, 2007 at 15:34 UTC
    talexb:

    My ideal size of a subroutine is "one thought". More specifically, a subroutine ought to be short--easy to understand when separated from the rest of your application. If it gets too large, then usually it's because you have other thoughts in there that should be split out as their own subroutines. Sometimes, however, it's long because the thought isn't naturally decomposable into multiple smaller thoughts. I can't think of an example off-hand, because I find these to be rarities.

    ...roboticus

      A state machine implementation is often done as a "switch" statement (read "if ... elsif ..." in Perl) and can get pretty long even when most of the work for each state is factored out into individual subs. There are ways to avoid the "switch" (dispatch tables for example), but often the operation of the state machine becomes less obvious.


      DWIM is Perl's answer to Gödel

        A state machine is a perfect example of what I was talking about with moving the long stuff to a data structure. Then you have a small dispatching routine and probably a long, repetitive data structure but no long code that you are trying to understand the structure of.

        I've worked a lot with state machines. With a non-trivial state machine, you don't look at the code to understand the state machine, you look at the state diagram. And the routines for each state are usually quite small, often tiny.

        For more casual state machines, factoring out the per-state actions is often important because otherwise you can't see how the states relate.

        So my guidelines for factoring out code from a state machine routine based on size are the same as for other code. And it is usually easier with state machine code because one state makes a perfect chunk to factor out.

        - tye        

        That's a very good example. When you write & document software, you don't normally use state machines. But when you do, it's because that's the best representation of that piece of the problem. And that state machine is the concept that you need to map to a subroutine, which naturally maps to a larger-than-normal subroutine. (Which thankfully has a simple, repetitive form, as you mention.)

        ...roboticus

Re: Is there an ideal module size or subroutine size?
by akho (Hermit) on Aug 14, 2007 at 15:25 UTC
    Code Complete suggests that the optimum size of a method is at 100-200 lines (the data is supported by pseudo-scientific references from the 1980s, too). On the other hand, these numbers should obviously depend on the language used and on the problem solved. Mine are usually much shorter.

    Refactoring only because you think your modules are too long is not a really good practice. One should think about what these modules represent and how easy they will be to fix (enormous modules may mean your subs are very tightly coupled, which is not a good thing). If, however, there is no obvious way to split the module's responsibility — it should stay as a whole despite its size.

    1000 lines is rather typical and probably fine, though. Unless they are written in a very terse, domain-specific way, of course — count concepts, not lines.

Re: Is there an ideal module size or subroutine size?
by clinton (Priest) on Aug 14, 2007 at 15:16 UTC
    It's a matter of personal preference, but I have very few subs which are more than about 30 lines in length - that way, they fit on the screen, and the logic is easy to follow. If it is getting too long, I remove a set of lines which together form a single logical unit and put that in a separate sub. The name of the new sub reflects that single piece of logic, and so makes the original code a lot easier to follow.

    I have a few modules which are about 1000 lines in length, but these start getting difficult to maintain. Most of my modules are less than 500 lines.

    I'm wondering how much repeated code you have in your 3 modules - with a bit of refactoring, you could probably split those modules up, and make more of code reuse. Besides being easier to follow, they'll make maintenance less of a nightmare.

    Clint

Re: Is there an ideal module size or subroutine size?
by perrin (Chancellor) on Aug 14, 2007 at 17:34 UTC
    Many times, I have had a well-organized OO module organically grow to the point where I could no longer quickly find the part of the code I needed to look at in my editor. Some people use code folding to help with this, but I find it kind of annoying. When I've taken the time to break up a large module, it has improved my ability to find things in the code, and also makes it easier to teach to someone else, since the API is in smaller bites.
Re: Is there an ideal module size or subroutine size?
by derby (Abbot) on Aug 14, 2007 at 18:10 UTC

    Well I hate to get all academic-y but I'm a big fan of using complexity metrics for measuring subroutines/methods/functions. The higher the complexity metric (ie, more conditionals) - the harder it is to test and that's a good reason to refactor a subroutine. As for modules ... well just like the maxim says, take care of your subroutines and your modules will take care of themselves.

    -derby

    Update: CPAN continues to amaze me.

Re: Is there an ideal module size or subroutine size?
by mr_mischief (Monsignor) on Aug 15, 2007 at 01:05 UTC
    Few of my modules get over about 600 lines, and most of them are under 400 lines.

    There are several reasons for that:
    • I'm often working with fairly simple concepts.
    • I let CPAN modules do much of the heavy lifting when I can.
    • I use mostly OO these days.I, personally, would rather have more simple objects than fewer complex ones -- as simple as makes sense, but no simpler. I usually break my modules along class lines.
    • I've found I can reuse modules easier the less they do. That means I'll write a module meant for reuse and another that either wraps or subclasses it for just the initial project if I think most of the module will be useful elsewhere but want to keep a bunch of extra stuff out of it.
    • Modularity can save space when reusing code if you don't need everything from one project in another, too.
    OTOH, modules which deal with big, monolithic things that are really complex might need to be big, monolithic things that are really complex. A number of CPAN modules have files that are multiple thousands of lines long. MIME::Lite is over 3k lines, IIRC. PDF::API2 has a few files over 30k lines among its supporting modules (all of which appear to be East Asian font sets). PDF::API2::Simple sits at over 1000 lines. DBI.pm is over 7k lines.

    One project I've been developing and maintaining for months has a number of modules specific to it. It also has a number of modules reused from other projects. Some of those have been improved and re-reused back into the original projects. There are several modules I'm using from CPAN for this project that are 2000 lines or longer, but none of the ones I've written are more than 1000 lines. That's one of the joys of CPAN. Someone else has done many of the big, hairy modules.
Re: Is there an ideal module size or subroutine size?
by samizdat (Vicar) on Aug 15, 2007 at 13:36 UTC
    I think there have been several good points raised here, which I'll try to highlight, and I have a bit of my own to add. YMMV.

    First off: "one thought" per function. Time spent organizing your code into structure, data, and actions is always time well spent, especially in a long-running project that will evolve and morph. When I start making stupid coding mistakes at 11 pm, I go home and sleep. Then I start in the next morning on a structural re-write.

    Second: data-driven programming. I totally agree with tye that rolling up constraints and actions into data structures is a really good thing to do. Think in terms of the high-level activity you're trying to accomplish:
    • recognize a pattern from this set, grab the pertinent data, and perform a corresponding action from another set
    • perform a sequence of actions based on a sequence of tokens (little languages)
    • parse a pattern and perform actions based on the state of the system (state machines)
    • etc., etc., etc.
    Finally: parameter globalization. One of the big bugaboos people run into while attempting to chop modules down to size is that they have been taught not to use global variables. Function parameter lists that are longer than the code are a Bad Idea. :) In web coding, especially with the better templating systems, persistent state variables can really simplify your code. In non-web code as well, global system objects and class variables can really help clarify what's going on.

    In all of this, the goal is to help you abstract and understand your code so that you can zero in on the right place to tweak. Yes, you will have more modules, but if you lay them out properly, you'll also have clear categories of modules and be able to dive into the right directory and file to get to the code you need.

    Don Wilde
    "There's more than one level to any answer."
Re: Is there an ideal module size or subroutine size?
by p6steve (Sexton) on Aug 17, 2007 at 21:52 UTC
    Well - I am an intermediate level (perl) programmer and I use a couple of rules of thumb:
    • create a sub where you use the same code more than once (that means twice!, or more)
    • generalize subs via parameters to avoid having multiple similar subs
    • if you have a long (ie >500 line) sequence of "meat & potatoes" code, then it is a good idea to reduce your "main" routine to a 30 line-ish sequence of calls to subs ... this is really a way to self document the code and forces you to limit use of global variables to make the code more re-usable