habit_forming has asked for the wisdom of the Perl Monks concerning the following question:
It seems as though perl cannot use a stringified reference to something as a key for a hash. Below is a quick example of this. What am I doing wrong? And if I am not doing anything wrong... why does perl do what it does?
Example: %hash = ([1,2] => "STUFF");
foreach $key (keys (%hash) )
{
print "key |$key| => value |$hash{$key}| :".
" dereferenced key |@{$key}|\n";
}
This prints out: key |ARRAY(0x80fbb0c)| => value |STUFF| : derefenced key ||
In fact "ref()" does not even see the key in that hash as a reference any longer.
So I have two questions:
1. Was this behavior planned?
2. If so, to what benefit?
--habit
Re: using references as keys in a hash.
by pfaut (Priest) on Feb 23, 2003 at 02:11 UTC
|
Perl can use the stringified reference as a key in a hash but that's not the same thing as using the reference itself. The reference is converted to a string to use as a key. The resulting key is just a scalar.
You might want to look into Tie::RefHash.
---
print map { my ($m)=1<<hex($_)&11?' ':'';
$m.=substr('AHJPacehklnorstu',hex($_),1) }
split //,'2fde0abe76c36c914586c';
| [reply] [d/l] |
|
Thank you for your reply. I understand that what perl is doing but the fact I cannot get the data back from the stringified reference is a bit... well... annoying.
| [reply] |
|
It is a tradeoff for speed. Defining the standard hash such that each key is forced to be a string allows the lookups to dereference keys to be very fast indeed.
Since the requirement for objects (ie real, unstringified references) as keys seems to be rare, it makes sense to gain the extra speed for the common case. Since Tie::RefHash is available, the additional inconvenience for this rare case is small, so it seems still to be a good trade.
If I understand correctly, the plan for perl6 is to allow access to alternate behaviours by a different mechanism (declare a property on the hash) which will make it easier to switch in much faster implementations than perl5's tying mechanism allows, but the underlying tradeoff will remain the same - the default will be the implementation that allows fastest key lookup, ie strings.
Hugo
| [reply] |
|
Well, perhaps you can. One could write a piece of XS or
Inline::C that tries to recreate the reference, by peeking
what's at that memory address. But the value might have
been garbage collected, and then you run into trouble.
Abigail
| [reply] |
|
Re: using references as keys in a hash.
by seattlejohn (Deacon) on Feb 23, 2003 at 02:15 UTC
|
When you stringify a reference, it no longer is a reference -- it's just a plain old string that happens to contain a human-readable representation of an address and a data type. So the code above is doing just what it would do if your initial hash assignment read:
%hash = ("ARRAY(0x80fbb0c)" => "STUFF");
If you turned on strict, you'd be warned that a string can't be used as an array reference.
Hash keys must be strings, not references. If you want to "dereference" something, you probably should be storing the reference itself as a hash value, so you can use it directly and not need to dereference it in the first place.
$perlmonks{seattlejohn} = 'John Clyman'; | [reply] [d/l] |
Re: using references as keys in a hash.
by Zaxo (Archbishop) on Feb 23, 2003 at 02:14 UTC
|
Correct, it is not a reference any more. Hash keys are strings, and the reference has been stringified to fit. That is 'planned', so to speak, by the key hashing algorithm.
That's not to say it is useless. References to distinct variables make fine keys, guaranteed to be unique.
After Compline, Zaxo
| [reply] |
|
They are only unique as long as the variable(s) exist.
If they are garbage collected, Perl will reuse the memory
for it, and it could be a new reference stringifies to
the same value as the old one.
Abigail
| [reply] |
Re: using references as keys in a hash.
by jonadab (Parson) on Feb 23, 2003 at 12:38 UTC
|
Others have pointed out why this is the way it is and
pointed you toward modules that will work around it,
but I'd like to point out a simpler, more blindingly
obvious solution: Go ahead and use the reference as
your key, but also store it as a value (in addition
to whatever other values you are storing).
There are two ways to do this, and which one you pick
is a matter of style. You can use parallel hashes,
with the same key across two or more hashes returning
a related set of values, or you can use a nested hash.
The latter is easier to make look similar to what you
have in your code...
%nestedhash = {
$someref => { ref => $someref, val => "STUFF", },
$anotherref => { ref => $anotherref, val => stuff(), },
}
Though the way hashes tend to be used in the real world,
you're more likely to end up with something more like
this...
while (($ref, $val) = get_pair()) {
my %thisrecord = { ref => $ref, val => $val };
$record{$ref} = \%thisrecord;
}
Personally, I tend to use parallel hashes, which
accomplishes roughly the same thing in a slightly
different way, like so...
while (($r, $v) = get_pair()) {
$ref{$r}=$r; $val{$r}=$v;
}
sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$
v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while
($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H=
"16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab
| [reply] [d/l] [select] |
|
Personally, I tend to use parallel hashes, ...
If two data structures are related, make that relationship OBVIOUS. Parallel data structures are not obviously related. In fact, it's a maintenance nightmare.
Let's set up a thought experiment. There are four parallel data structures. It doesn't matter at all what they are, except they have the following properties:
- A set of config-type parameters
- Modified everywhere (whether global or passed around)
- Within the fubar() function, only three are referenced. (Since every developer knows that the four are parallel, there's no commenting to mention the fourth.)
I am your maintenance programmer. I come along and are told there is a bug in the fubar() function and I need to fix it in 24 hrs. I go and realize that I need this value to make it right. I don't know that the value is in this fourth data structure. But, I need to fix fubar() right now. So, I add some crazy structure to get that fourth value into fubar(). The code is now worse.
All of that is avoided by using a second level of data structures. Thus, this set of config-type parameters is handled around as one reference. I, the hapless maintainer, is shown by the very way the data is structured that my needed value is there for me already. I don't need to hack the code up and make my job harder, just to do my job.
(And, in case you're thinking that this is a contrived thought experiment ... maintenance programmers are often given that exact task, with about that level of knowledge about the system. It's not a perfect world out there. It is our job as developers to think about the maintainer who will come after us. You will maintain at some point in your career and will thank the developer with forethought.)
(If you think your code won't be maintained, remember this - that's what the mainframe developers in the 1970's thought when they used 2-digit years. I mean, who's going to keep this code around for 30(!) years?)
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
| [reply] |
|
If two data structures are related, make that relationship OBVIOUS.
I agree with that.
Parallel data structures are not obviously related.
It seems obvious to me that if you see them assigned
together, they're related. I did say it was a matter of
style, however, and I expected some people to have a
strong preference for the nested structures. I do use
the nested structures in some cases, when what I want to
do is a little more complex, or if there are multiple
levels of nesting, or some other good reason. And I gave
the example of using nested structures first. Don't read
more into my statement about parallel structures than is
there.
In fact, it's a maintenance nightmare.
Let's set up a thought experiment.
Thought experiments can lead you to conclude that
a heavier object will always fall faster than a
lighter one. (They can also be useful, but you
have to take them cum grano salis.)
I am your maintenance programmer.
Oooh, oooh, can I imagine that I named all my
variables with single characters and used recursive
nested evals wherever possible? ;-)
I come along and are
told there is a bug in the fubar() function and I need
to fix it in 24 hrs. I go and realize that I need this
value to make it right. I don't know that the value is
in this fourth data structure. But, I need to fix
fubar() right now. So, I add some crazy structure to
get that fourth value into fubar(). The code is now
worse.
The code will always be worse when someone who is not
familiar with the code attempts to fix something
right now without understanding how it works. No
amount of wonderful data structure will change that.
(This is not an argument for bad data structures;
I'm merely pointing out that no data structure can
prevent the scenerio you describe.)
Furthermore, unless I'm missing something, there's
nothing magic about the syntax of nesting that will
alert the unfamiliar programmer to the existence of
more data than is being used in the piece of code
he's viewing. A simplistic example...
sub foobar {
my ($object, $result);
foreach $object (@_) {
$result .= "Title:\t" . $$object{title} ."\n"
. "Author:\t" . $$object{author} ."\n"
. "-------------------------------\n";
}
return $result;
}
Will the programmer know to look in $$object{ISBN}
for the piece of data he needs to fulfill the change
request? Maybe, but if so it's not any more obvious
than (with parallel structures) looking in $isbn{$key}.
If he reads through the well-commented code, he'll
find it either way.
Of course, if the code is more complex and has a
larger number of fields, then the nested structure
can be traversed more efficiently, avoiding the bug
in the first place...
sub foobar {
my ($object, $result, $f);
foreach $object (@_) {
foreach $f (sort @fields) {
$result .= "$f:\t" . $$object{$f} ."\n";
}
$result .= "-------------------------------\n";
}
return $result;
}
But the original poster is talking about what is
currently a single hash storing a single value for
each key, and I was suggesting also storing the
unstringified reference used to create the hash
key. That's a total of two fields: not complex
enough to really need the nested structure, IMO.
Yes, the nested structure will solve the problem
nicely, but the parallel structure will also work.
Note that I'm not saying that parallel structures are
better, or even that they're as good in every case;
I only said that which you use is a matter of style.
The program will get the same result either way.
sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$
v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while
($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H=
"16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
| [reply] |
|
Ironically, your code snippets above try to use references as keys, although unintentionally. The lines I'm referring to are
%nestedhash = {
and
my %thisrecord = { ref => $ref, val => $val };
Both should be using parentheses instead of curly brackets.
ihb
| [reply] [d/l] [select] |
|
sub H{$_=shift;while($_){$c=0;while(s/^2//){$c++;}s/^4//;$
v.=(' ','|','_',"\n",'\\','/')[$c]}$v}sub A{$_=shift;while
($_){$d=hex chop;for(1..4){$pl.=($d%2)?4:2;$d>>=1}}$pl}$H=
"16f6da116f6db14b4b0906c4f324";print H(A($H)) # -- jonadab
| [reply] [d/l] |
Why use references as keys in a hash?
by Solo (Deacon) on Feb 23, 2003 at 14:32 UTC
|
We've seen a few means of using references as hash keys. That's pretty cool, from a theoretical standpoint--but I was wondering what the practical application could be.
What are examples of using references as hash keys to make problems easier?
My example, if I wanted to keep a hash of my usernames for websites, and put the prepared HTTP request in the hash.
use Tie::RefHash;
use HTTP::Request;
use Data::Dumper;
$r1 = HTTP::Request->new(GET => 'http://www.perlmonks.org/');
$r2 = HTTP::Request->new(GET => 'http://use.perl.org/');
# Etc...
tie %sites, 'Tie::RefHash', ($r1, 'Solo', $r2, 'Solo');
print Dumper(\%sites);
It seems too trivial, but that's probably just my lack of vision ;)
--Solo
--
I think my eyes are getting better. Instead of a big dark blur, I see a big light blur.
| [reply] [d/l] |
|
| [reply] |
|
|