Fellow Monks,

This started out as a SOPW entitled Hash Gravity, but somewhere around revision sixty or so, felt that maybe a Meditation would be more appropriate. Apologies in advance if mis-posted.

One of my responsibilities is maintenance of several CGI, LDAP, and GroupWise E-mail directories and mailing lists. I use Perl to manipulate the core database of account information. The data files start out as flat text files and are massaged into hash structures that are then written out using modules MLDBM and Storable::nstore for later use.

Recently I discovered that it was necessary to preserve account information for people with multiple E-mail accounts. In the past a single account was hierarchically selected as their primary account and the others ignored. Thus the system has taken on a whole new dimension (sorry, couldn't resist).

I've already got code and structures that work, but I'm curious about alternate (read more Perlish) ways to implement the structures. Note that readability is an issue since I'm the only Perl user in my office.

Below is a sub that illustrates the structures in question:

sub DumpAccounts { my $var; my @SYS = ('GRP', 'JAG', 'CIS', 'MST', 'GCG'); my %POvars = ( 'GRP' => ['ppo', 'userid', 'email', 'gwpo', 'gwdomain'], 'JAG' => ['ppo', 'userid', 'email', 'jagexcept'], 'CIS' => ['ppo', 'userid', 'email'], 'MST' => ['ppo', 'userid', 'email'], 'GCG' => ['ppo', 'userid', 'email'] ); foreach my $ssn (sort keys %accounts) { print "$ssn "; foreach my $sys (@SYS) { if (exists $accounts{$ssn}{$sys}) { foreach $var ( @{ $POvars{$sys} }) { print $accounts{$ssn}{$sys}{$var} . " "; } print "\n" . " " x 10; } } print "\n"; } }
Sample results:
123456789 JAG jblow jblow@jag.site.edu N 222222222 GRP mjones mjones@grp.site.edu ABCDE ABCDEMAIL 555555555 GRP hsimpson hsimpson@grp.site.edu FGHIJ FGHIJMAIL JAG dohhh dohhh@jag.site.edu Y GCG homers homers@gcg.site.edu
It seemed that using text indices ($accounts{'ssn'}{'sys'}{'vars'}) was easier for me to implement than trying to figure out the alternative $accounts{'ssn'}{sys}{var}. I'm not sure if the latter is desirable given the varying nature of the {'vars'} dimension. But even if I went with it, wouldn't the code be much more complex?

Being a multi-lingual kind of person, I tend to gravitate toward the use data structures in a way that simplifies coding and increases readability. Thus the crux of this post. What are your opinions about the balance of data structures/code complexity/readability?

Edited by footpad - 10 Aug 01, ~10:00 am (PDT)

Replies are listed 'Best First'.
Use ADT's to Balance everything!
by dragonchild (Archbishop) on Aug 10, 2001 at 21:26 UTC
    Heh. Having just finished the chapter in Code Complete on Abstract Data Types over lunch, I've got a few comments for you.
    1. Use ADT's. They will make your life so much easier. And, your code will be more readable and maintainable. (Read on for how!)
    2. Use accessor methods into your ADT. Rip out every single direct access into your data structure(s) and replace them with well-defined, loosely-coupled accessor routines. Things like GetEmailAccounts(), Remove EmailAccount(), and AddEmailAccount(). Your code using this ADT will be very easy to read. It might even become self-documenting!
    3. Put your ADT into a module. Maybe, even an object! Then (and here's the kicker) ...
      NEVER EVER DIRECTLY ACCESS THE INTERNALS!!!
    4. Let me repeat it.
      NEVER EVER DIRECTLY ACCESS THE INTERNALS!!!
    5. If you do, you will be worse off that you are now, cause you'll change the implementation and something will break and you don't know why and you don't know where and you thought you should've had it cause you had these fancy accessors but someone violated the contract! *deep breath*
    The point here is data-hiding/encapsulation/all that stuff. Add on well-named and well-defined accessor methods and the only place your data structure is ugly is in the module(s) that define/access it. Sorta like CGI, DBI, or IO::File, if you think about it. Make your own CGI and the people who follow will bless your name one thousand times.

    ------
    /me wants to be the brightest bulb in the chandelier!

    Vote paco for President!

      i like your point ... the combination of style and maintainability that permeates "code complete" had influenced my perception of good programming since i first read it. abstracted a level, it's a work that has pertinence in contexts far removed from programming.
Re: Balancing Complexities of Data Structures, Code and Readability
by petral (Curate) on Aug 10, 2001 at 22:55 UTC
    I know you posted this to ask for different ways to do it, so my comment isn't really fair.  but...

    A principle from Extreme Programming is "You ain't gonna need it".  What you have now is probably fine.  You're sorting by users and then by systems per user which I guess is what you meant by 'multiple e-mail accounts'.  If you're calling the function more than once per run you could probably move @SYS and %POVars outside the function, since they only need to be set once, but that's about it.

    Basically, having %accounts setup the way you need it is what really matters.  Perl lets you process it this way and that whenever you need to, without much fuss.

    Only if it's looking like a lot of different scripts (or a few really complex ones) are going to be kept around and need to be maintianed or you find yourself typing the same code over and over to access it, would you need to go further.  Then, work out a simple useful interface (eg,  @accts = $accts->user($ssn);  hmmm, is that simpler?) and build an encapsulating object.

    update: I didn't emphasize the central advantage of waiting, which is that by waiting until you need it, you'll know much more specifically what you want to do.

      p

      I agree with most of the above comments, and in particular, I would also stress how important it is to have "constant" data outside of your function.

      Additionally, eliminating duplication is also a priority, though not at the expense of needless complexity. In this case, @SYS is really the same as keys %POvars, so there is no need for this extra definition.

      Further along those lines, I would restructure your definition something like:
      my @POvars_common = qw[ ppo userid email ]; my %POvars = ( 'GRP' => [@POvars_common, 'gwpo', 'gwdomain'], 'JAG' => [@POvars_common, 'jagexcept'], 'CIS' => [@POvars_common], 'MST' => [@POvars_common], 'GCG' => [@POvars_common] );
      Although this doesn't seem like a big deal, removal of duplication can help with:
      • Accidental transposition errors, such as one of your entries having 'emall' instead of 'email', which is hard to spot amidst many similar lines. Using Perl with the '-w' option, and use strict will spot errors in your variable names, but not your string constants, unless these happen to generate errors as well.
      • Errors when changing the structure on a global scale, which requires modifications to every single entry in this case. This usually happens to every program, so plan for it in advance.
      • Synchronization errors between needlessly linked data structures, such as @SYS and %POvars, where one has an entry which the other does not.
      That's not what I find.

      I tell co-workers that "Programs get more complex over time." and encourage them to plan for that.

Re: Balancing Complexities of Data Structures, Code and Readability
by kjherron (Pilgrim) on Aug 10, 2001 at 22:11 UTC
    Just to expand on dragonchild's comment, I'd be inclined to create a package for each kind of account, with a "new" method to create a blessed object from the data in the original text file. Each $accounts{ssn} would contain an anonymous array of these objects. Creating %accounts might be done as follows:
    while (<>) { my($ssn, $obj) = parse_account($_); push @{$accounts{$ssn}}, $obj; }
    parse_account() of course would have to be able to determine the account type that the string defines, then call the appropriate account-object constructor. Each individual account-type package would provide a standard set of methods to access account information. E.g. to print out all the email addresses you'd implement a method called "email" within each package; it could be called like this:
    while (my($ssn, $acctlist) = each %accounts) { foreach my $acct (@$acctlist) { print $ssn, $acct->email, "\n"; } }
    A simple, individual "email" method would just look like:
    sub email { my $self = shift; return $self->{'email'}; }
    If you wanted to get fancy, all of the individual account-type packages could inherit from a more generic package; this would let you avoid having to create all these duplicate trivial methods.