in reply to Re: using references as keys in a hash.
in thread using references as keys in a hash.

Personally, I tend to use parallel hashes, ...

If two data structures are related, make that relationship OBVIOUS. Parallel data structures are not obviously related. In fact, it's a maintenance nightmare.

Let's set up a thought experiment. There are four parallel data structures. It doesn't matter at all what they are, except they have the following properties:

I am your maintenance programmer. I come along and are told there is a bug in the fubar() function and I need to fix it in 24 hrs. I go and realize that I need this value to make it right. I don't know that the value is in this fourth data structure. But, I need to fix fubar() right now. So, I add some crazy structure to get that fourth value into fubar(). The code is now worse.

All of that is avoided by using a second level of data structures. Thus, this set of config-type parameters is handled around as one reference. I, the hapless maintainer, is shown by the very way the data is structured that my needed value is there for me already. I don't need to hack the code up and make my job harder, just to do my job.

(And, in case you're thinking that this is a contrived thought experiment ... maintenance programmers are often given that exact task, with about that level of knowledge about the system. It's not a perfect world out there. It is our job as developers to think about the maintainer who will come after us. You will maintain at some point in your career and will thank the developer with forethought.)

(If you think your code won't be maintained, remember this - that's what the mainframe developers in the 1970's thought when they used 2-digit years. I mean, who's going to keep this code around for 30(!) years?)

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

  • Comment on Parallel structures are NOT maintainable

Replies are listed 'Best First'.
Re: Parallel structures are NOT maintainable
by jonadab (Parson) on Feb 23, 2003 at 21:01 UTC
    If two data structures are related, make that relationship OBVIOUS.

    I agree with that.

    Parallel data structures are not obviously related.

    It seems obvious to me that if you see them assigned together, they're related. I did say it was a matter of style, however, and I expected some people to have a strong preference for the nested structures. I do use the nested structures in some cases, when what I want to do is a little more complex, or if there are multiple levels of nesting, or some other good reason. And I gave the example of using nested structures first. Don't read more into my statement about parallel structures than is there.

    In fact, it's a maintenance nightmare. Let's set up a thought experiment.

    Thought experiments can lead you to conclude that a heavier object will always fall faster than a lighter one. (They can also be useful, but you have to take them cum grano salis.)

    I am your maintenance programmer.

    Oooh, oooh, can I imagine that I named all my variables with single characters and used recursive nested evals wherever possible? ;-)

    I come along and are told there is a bug in the fubar() function and I need to fix it in 24 hrs. I go and realize that I need this value to make it right. I don't know that the value is in this fourth data structure. But, I need to fix fubar() right now. So, I add some crazy structure to get that fourth value into fubar(). The code is now worse.

    The code will always be worse when someone who is not familiar with the code attempts to fix something right now without understanding how it works. No amount of wonderful data structure will change that. (This is not an argument for bad data structures; I'm merely pointing out that no data structure can prevent the scenerio you describe.)

    Furthermore, unless I'm missing something, there's nothing magic about the syntax of nesting that will alert the unfamiliar programmer to the existence of more data than is being used in the piece of code he's viewing. A simplistic example...

    sub foobar { my ($object, $result); foreach $object (@_) { $result .= "Title:\t" . $$object{title} ."\n" . "Author:\t" . $$object{author} ."\n" . "-------------------------------\n"; } return $result; }

    Will the programmer know to look in $$object{ISBN} for the piece of data he needs to fulfill the change request? Maybe, but if so it's not any more obvious than (with parallel structures) looking in $isbn{$key}. If he reads through the well-commented code, he'll find it either way.

    Of course, if the code is more complex and has a larger number of fields, then the nested structure can be traversed more efficiently, avoiding the bug in the first place...

    sub foobar { my ($object, $result, $f); foreach $object (@_) { foreach $f (sort @fields) { $result .= "$f:\t" . $$object{$f} ."\n"; } $result .= "-------------------------------\n"; } return $result; }

    But the original poster is talking about what is currently a single hash storing a single value for each key, and I was suggesting also storing the unstringified reference used to create the hash key. That's a total of two fields: not complex enough to really need the nested structure, IMO. Yes, the nested structure will solve the problem nicely, but the parallel structure will also work.

    Note that I'm not saying that parallel structures are better, or even that they're as good in every case; I only said that which you use is a matter of style. The program will get the same result either way.


    sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$ v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while ($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H= "16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab
      A few comments:
      1. I'm merely pointing out that no data structure can prevent the scenerio you describe.

        No, but it can mitigate it. I am saying that, all things being equal, nested data structures contain more information intrinsic to their structure that parallel ones do.

      2. Will the programmer know to look in $$object{ISBN} for the piece of data he needs to fulfill the change request?

        Yes, he/she will. Why? Because they can go look at the definition for $object and see that there is an ISBN member. (You do have object definitions, right?) Failing that, Data::Dumper is one of the maintenance programmer's best friends. But, Data::Dumper doesn't know about that parallel data structure, does it? If it doesn't, it takes more time for me to find the right solution.

        (Nit: Use $object->{ISBN} instead ... $$object{ISBN} can lead to subtle bugs. Another maintenance headache, not a style issue.)

      3. The program will get the same result either way.

        This is a horrible statement, especially in this argument. Not only is it a tautology, but I don't care what hoops the computer has to go through to understand what I want it to do. COMPUTER RESOURCES ARE (NEARLY ALWAYS) CHEAPER THAN HUMAN RESOURCES.

      Let me put that last point another way. My programming services cost more per week than a top-of-the-line linux server. People like merlyn and others cost at least twice that. Is it worth it to you for me to spend 40 hours figuring out how to save a meg of RAM?

      It is worth a lot of money to write code that encapsulates as much information as possible in as many ways as possible that will be guaranteed to change as the code changes. Parallel data structures are a matter of style, yes. They are a poor choice of style because they will cost more money in maintenance than nested structures.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        Nit: Use $object->{ISBN} instead ... $$object{ISBN} can lead to subtle bugs.

        I meant to ask before, and forgot: can you elaborate on this? Assuming for the moment that $object is a real reference here, not a "symbolic reference" string (we'll pretend for the moment that I was using strict; if the program were complex enough to span more than about half a dozen subroutines I would be), what subtle bugs would (or could) my syntax lead to? At first I thought you meant that someone might write $object{foo} by mistake instead of $$object{foo}, but then I realised warnings or strict either one would catch that, so you must be talking about something more subtle... but what?


        sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$ v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while ($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H= "16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab
        Is it worth it to you for me to spend 40 hours figuring out how to save a meg of RAM?

        Maybe, maybe not. A meg of RAM may not seem much, but things tend to add up. If it's a long running process, a meg of RAM every now and then does add up. And suddenly, you reach the memory limit your OS allows for a process, requiring a restart of the process every two to three days. You might say, so what? But some business models just don't accept that.

        Abigail

        Let me put that last point another way. My programming services cost more per week than a top-of-the-line linux server. People like merlyn and others cost at least twice that. Is it worth it to you for me to spend 40 hours figuring out how to save a meg of RAM?

        Huh? Where did that come from? Fourty hours to find and fix a single small issue? If I spent an entire five-hour shift fixing some little thing like that, I'd feel like I didn't get anything done that day. In a thirty-hour work week, I could rewrite the application from scratch and have time left over to unstick printers, bug the APCC tech support people about our ongoing PowerChute issue, teach an Introduction to the Internet class to a group of senior citizens, reboot my coworkers' Windows systems for them as necessary ("I restarted it. It should be better now."), run a couple of custom reports for my boss (and write one-off Perl scripts to turn them into meaningful data), and redo the stylesheets for the cgi scripts in question just because I felt like it.

        Either we're talking about programs so different in size that it's not remotely meaningful to talk about them in the same conversation (as I suspected when I read your previous message upthread), or else one of is seriously mispaid (since I don't make anything like the kind of wage you are talking about).

        I said that for complex situations the nested structures are better, but it seems to me that deeply complex programs are the only kind you are willing to concede might ever exist. Take a deep breath; some of us do simple stuff sometimes.


        sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$ v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while ($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H= "16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab
Re: Parallel structures are NOT maintainable
by dws (Chancellor) on Feb 23, 2003 at 17:33 UTC
    All of that is avoided by using a second level of data structures. ...

    With a set of Unit Tests as insurance.