in reply to using references as keys in a hash.

Others have pointed out why this is the way it is and pointed you toward modules that will work around it, but I'd like to point out a simpler, more blindingly obvious solution: Go ahead and use the reference as your key, but also store it as a value (in addition to whatever other values you are storing).

There are two ways to do this, and which one you pick is a matter of style. You can use parallel hashes, with the same key across two or more hashes returning a related set of values, or you can use a nested hash. The latter is easier to make look similar to what you have in your code...

%nestedhash = { $someref => { ref => $someref, val => "STUFF", }, $anotherref => { ref => $anotherref, val => stuff(), }, }

Though the way hashes tend to be used in the real world, you're more likely to end up with something more like this...

while (($ref, $val) = get_pair()) { my %thisrecord = { ref => $ref, val => $val }; $record{$ref} = \%thisrecord; }

Personally, I tend to use parallel hashes, which accomplishes roughly the same thing in a slightly different way, like so...

while (($r, $v) = get_pair()) { $ref{$r}=$r; $val{$r}=$v; }

sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$ v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while ($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H= "16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab

Replies are listed 'Best First'.
Parallel structures are NOT maintainable
by dragonchild (Archbishop) on Feb 23, 2003 at 15:51 UTC
    Personally, I tend to use parallel hashes, ...

    If two data structures are related, make that relationship OBVIOUS. Parallel data structures are not obviously related. In fact, it's a maintenance nightmare.

    Let's set up a thought experiment. There are four parallel data structures. It doesn't matter at all what they are, except they have the following properties:

    • A set of config-type parameters
    • Modified everywhere (whether global or passed around)
    • Within the fubar() function, only three are referenced. (Since every developer knows that the four are parallel, there's no commenting to mention the fourth.)
    I am your maintenance programmer. I come along and are told there is a bug in the fubar() function and I need to fix it in 24 hrs. I go and realize that I need this value to make it right. I don't know that the value is in this fourth data structure. But, I need to fix fubar() right now. So, I add some crazy structure to get that fourth value into fubar(). The code is now worse.

    All of that is avoided by using a second level of data structures. Thus, this set of config-type parameters is handled around as one reference. I, the hapless maintainer, is shown by the very way the data is structured that my needed value is there for me already. I don't need to hack the code up and make my job harder, just to do my job.

    (And, in case you're thinking that this is a contrived thought experiment ... maintenance programmers are often given that exact task, with about that level of knowledge about the system. It's not a perfect world out there. It is our job as developers to think about the maintainer who will come after us. You will maintain at some point in your career and will thank the developer with forethought.)

    (If you think your code won't be maintained, remember this - that's what the mainframe developers in the 1970's thought when they used 2-digit years. I mean, who's going to keep this code around for 30(!) years?)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      If two data structures are related, make that relationship OBVIOUS.

      I agree with that.

      Parallel data structures are not obviously related.

      It seems obvious to me that if you see them assigned together, they're related. I did say it was a matter of style, however, and I expected some people to have a strong preference for the nested structures. I do use the nested structures in some cases, when what I want to do is a little more complex, or if there are multiple levels of nesting, or some other good reason. And I gave the example of using nested structures first. Don't read more into my statement about parallel structures than is there.

      In fact, it's a maintenance nightmare. Let's set up a thought experiment.

      Thought experiments can lead you to conclude that a heavier object will always fall faster than a lighter one. (They can also be useful, but you have to take them cum grano salis.)

      I am your maintenance programmer.

      Oooh, oooh, can I imagine that I named all my variables with single characters and used recursive nested evals wherever possible? ;-)

      I come along and are told there is a bug in the fubar() function and I need to fix it in 24 hrs. I go and realize that I need this value to make it right. I don't know that the value is in this fourth data structure. But, I need to fix fubar() right now. So, I add some crazy structure to get that fourth value into fubar(). The code is now worse.

      The code will always be worse when someone who is not familiar with the code attempts to fix something right now without understanding how it works. No amount of wonderful data structure will change that. (This is not an argument for bad data structures; I'm merely pointing out that no data structure can prevent the scenerio you describe.)

      Furthermore, unless I'm missing something, there's nothing magic about the syntax of nesting that will alert the unfamiliar programmer to the existence of more data than is being used in the piece of code he's viewing. A simplistic example...

      sub foobar { my ($object, $result); foreach $object (@_) { $result .= "Title:\t" . $$object{title} ."\n" . "Author:\t" . $$object{author} ."\n" . "-------------------------------\n"; } return $result; }

      Will the programmer know to look in $$object{ISBN} for the piece of data he needs to fulfill the change request? Maybe, but if so it's not any more obvious than (with parallel structures) looking in $isbn{$key}. If he reads through the well-commented code, he'll find it either way.

      Of course, if the code is more complex and has a larger number of fields, then the nested structure can be traversed more efficiently, avoiding the bug in the first place...

      sub foobar { my ($object, $result, $f); foreach $object (@_) { foreach $f (sort @fields) { $result .= "$f:\t" . $$object{$f} ."\n"; } $result .= "-------------------------------\n"; } return $result; }

      But the original poster is talking about what is currently a single hash storing a single value for each key, and I was suggesting also storing the unstringified reference used to create the hash key. That's a total of two fields: not complex enough to really need the nested structure, IMO. Yes, the nested structure will solve the problem nicely, but the parallel structure will also work.

      Note that I'm not saying that parallel structures are better, or even that they're as good in every case; I only said that which you use is a matter of style. The program will get the same result either way.


      sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$ v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while ($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H= "16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab
        A few comments:
        1. I'm merely pointing out that no data structure can prevent the scenerio you describe.

          No, but it can mitigate it. I am saying that, all things being equal, nested data structures contain more information intrinsic to their structure that parallel ones do.

        2. Will the programmer know to look in $$object{ISBN} for the piece of data he needs to fulfill the change request?

          Yes, he/she will. Why? Because they can go look at the definition for $object and see that there is an ISBN member. (You do have object definitions, right?) Failing that, Data::Dumper is one of the maintenance programmer's best friends. But, Data::Dumper doesn't know about that parallel data structure, does it? If it doesn't, it takes more time for me to find the right solution.

          (Nit: Use $object->{ISBN} instead ... $$object{ISBN} can lead to subtle bugs. Another maintenance headache, not a style issue.)

        3. The program will get the same result either way.

          This is a horrible statement, especially in this argument. Not only is it a tautology, but I don't care what hoops the computer has to go through to understand what I want it to do. COMPUTER RESOURCES ARE (NEARLY ALWAYS) CHEAPER THAN HUMAN RESOURCES.

        Let me put that last point another way. My programming services cost more per week than a top-of-the-line linux server. People like merlyn and others cost at least twice that. Is it worth it to you for me to spend 40 hours figuring out how to save a meg of RAM?

        It is worth a lot of money to write code that encapsulates as much information as possible in as many ways as possible that will be guaranteed to change as the code changes. Parallel data structures are a matter of style, yes. They are a poor choice of style because they will cost more money in maintenance than nested structures.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      All of that is avoided by using a second level of data structures. ...

      With a set of Unit Tests as insurance.

Re: Re: using references as keys in a hash.
by ihb (Deacon) on Feb 24, 2003 at 13:21 UTC

    Ironically, your code snippets above try to use references as keys, although unintentionally. The lines I'm referring to are   %nestedhash = { and   my %thisrecord = { ref => $ref, val => $val }; Both should be using parentheses instead of curly brackets.

    ihb

      Quite so. That'll teach me to post without testing.


      sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$ v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while ($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H= "16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab