periapt has asked for the wisdom of the Perl Monks concerning the following question:

I seem to have painted myself into a bit of a corner here while developing some library modules. I'm not sure I understand the basics of modules well enough to work my way through this. Basically, I have a set of modules that serve as a library for a suite of programs. On problem is that I have two of these library modules that require each at least one (not related) function from the other.

It works but doesn't feel clean to me. Like an accident waiting to happen. I have always been uncomfortable with the design but don't quite know how to alleviate it. This node, Circular usage has only served to heighten feelings of disquiet.

My questions first (then background)

  • Are there better ways to structure these libraries so that they do not have to reference each other?
  • Does it matter (in terms of system resources) how many functions are contained in a module if a program only imports a couple of functions?
  • Is there any harm/advantage associated with requiring/using a module inside another module? at the head of a file? in the specific function that requires it?
  • Are there any advantages/disadvantages to having one large (many functioned) module over several smaller modules?

  • For some background. I have a suite of programs that processes a set of data. Each of the eight or nine programs in this suite operate on the same data set but do different things. Some are ETL programs, others are analysis and reporting and some are system maintenance programs.

    Basically, I have a library of general functions (around twenty) (Gen.pm) that perform routine actions used by the various parts of system. For example, converting three atomic fields into one compound field. Individual data elements are grouped in logical blocks and there are checks to see if a particular block is blank, some constants and general purpose variables, that sort of thing. At least one general function requires a format validation which is contained in the Spec.pm library

    The next library contain specific validation functions (a few hundred) (Spec.pm) which enforce a standard on individual data elements. Natually, one of the validations has a requirements if a block is blank which is determined by a function in the Gen.pm library. and so on ...

    Not every program uses these functions at the same time so that one program might export the function in Spec.pm that requires Gen.pm and not export the other function in Gen.pm that requires Spec.pm. Other programs export the lot from both libraries.

    I have been thinking about just combining both libraries into one and letting it go at that but I don't really know the advantage/disadvantages involved. One complication is that I will soon be implimenting another round of validations that will have to run in parallel with the first but which will be significantly different. Now I have a third module with approx 600 functions. These specs will likely reference both Gen.pm and Spec.pm.

    Any thoughts will be most appreciated :o)

    Update:
    Many thanks for the great ideas. Funny that abstracting the dependent function out into a base library never occurred to me.

    Another alternative occurred to me last night. I could just move the entirety of Gen.pm into Spec.pm. To avoid rewriting the programs that require it, I could cast Gen.pm as a wrapper module that exports the needed functions from Spec.pm. Does anyone have any experience with that kind of setup? I still have the issue of a module (Spec.pm) with a few hundred functions it it. If that is an issue at all? What exactly is the effect on program size and resource usage of such a module?

    PJ
    use strict; use warnings; use diagnostics;

    Replies are listed 'Best First'.
    Re: Modules that reference each other
    by dragonchild (Archbishop) on Sep 29, 2004 at 15:02 UTC
      Circular dependencies almost always mean that you have not modularized appropriately. If Foo depends on Bar and Bar depends on Foo, at least one of those dependencies needs to be a Baz so that Foo depends on Bar and Bar depends on Baz (where Baz is what was in Foo).

      Remember - each thing should do one thing, one thing only, and do that one thing well. The art is in making the boxes.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

    Re: Modules that reference each other
    by SpanishInquisition (Pilgrim) on Sep 29, 2004 at 14:33 UTC
      "At least one general function requires a format validation which is contained in the Spec.pm library"

      Sounds like this is the dependancy you need to remove. The program/general-library should not know the validation piece exists.

      Meanwhile, what you really need to do is modularilize your program so that you don't have to export anyting. (Perl OOP -- when done at a minimal level -- isn't that painful and it doesn't have to be overdone if you don't want to overdo it).

    Re: Modules that reference each other
    by Velaki (Chaplain) on Sep 29, 2004 at 15:20 UTC

      One possible solution to mutual definitions is to refactor the individual classes, and try to abstract the prerequisites into "parent" classes.

      What does that mean? Well, basically you pull out the functions needed by both packages, and put them into a parent package. You might need to do this for a number of layers to completely decouple the mutual requirements. Essentially, it's akin to many-to-many relationships in databases, which are frequently resolved by using aggregated relationships.

      Just a thought,
      -v
      "Perl. There is no substitute."
    Re: Modules that reference each other
    by mstone (Deacon) on Sep 29, 2004 at 18:09 UTC

      The basic solution is to put all the circular code into the same module.

      You might be able to do that simply by moving everything into one of your existing modules, or you might need to create a whole new module. From a design perspective, it's usually best to start by trying to build a completely new module. That tends to highlight the problem that created the circular dependency in the first case.

      Functions are easy to move if they don't refer to anything outside themselves. This function:

      sub easy_to_move { my @params = @_; $result = [some simple manipulation of @params] return ($result); }

      can be picked up and moved anywhere without changing a thing. It isn't aware of anything except its parameter list, so it doesn't care where it lives. Functions that call other functions can be slightly harder to move, but not much.

      The thing that makes functions hard to move is a dependency on some kind of data. This function:

      sub hard_to_move { my @params = @_; my $anchor = $STATIC_VARIABLE; $result = [something that involves $anchor]; return ($result); }

      is hard to move because it's handcuffed to a specific piece of data within the module. Moving the function means finding a way to move the data as well. Modules make it easy to share data among several functions, so the odds are fairly good you'll find some kind of data anchor holding your code in a specific place. Cirular dependency itself is often a sign that your data hasn't been distributed among the modules properly.

      That happens. Programming is research, and at every stage you make enough assumptions to carry you to the next stage. Sometimes, along the way, you find subtle problems in your assumptions. That doesn't make you a bad programmer, it just means you're working on something new. The only times you don't run into trouble like that is when you're working on a toy problem that doesn't present any real complexity, or when you're working on something that's been done so many times before that there are no surprises left.

      To manage the dependencies, you have to manage your data. To manage your data, you give up the convenience of having everything right there at your fingertips, and pass the data as a parameter.

      If you only have to turn one or two values into parameters to make the function portable, just go ahead and pass the values directly. If you find your parameter list getting ridiculously long, create a data structure and pass that. If you do end up creating a data structure, look at the possibility of making that structure a parameter for the entire module, not just for the function you want to move.

      Many circular dependencies can be solved by paritioning the code into 'data storage' and 'data manipulation' modules. The basic rules of such modules are that 'storage' modules do only that. They don't contain any manipulation code that isn't absolutely trivial. 'Manipulation' modules, OTOH, don't contain any data of their own. They just manipulate whatever the storage module gives them.

      Most code contains a mix of those two kinds of behavior, and usually that's no problem. When you run into circular dependencies, though, it's time to get very strict about your division of responsibilities until the dependency problems go away.

    Re: Modules that reference each other
    by krishnamurphy (Initiate) on Sep 30, 2004 at 10:18 UTC
      One option might be pulling the routines that you need to "have non-circular" out of each of these libraries into something like Base.pm - then you can pull the routines you need from it into either of the two under review, as needed. That way you don't have to have one huge library with lots of routines remaining unused under many circumstances.