Re: two hashes occupying the same space

There is no direct way, though it might be possible to share some of the memory by dropping to XS. I have read something, somewhere about it being possible that hash key storage can be shared sometimes. Where the SV is a key in more than 1 hash, Perl will (sometimes?) reuse the same copy for both hashes

I can't actually remember where I read about this sharing--I'll update if I find it. In any case, I don't think that applies here as the keys of one would be the values of the other and vice versa.

A quick test shows that two hashes with 300,000 key/value pairs ("knnnn"/"vnnnn") requires around 90 MB in total, which isn't too bad on modern machines. It may require more if your values are longer than around 12 characters each.

In this case, you may be able to save a little memory by storing references to the actual data. If a high proportion of the actual data values are say 30+ characters in length, then storing them once and putting references to them in the hashes may achieve some savings, but you'd need to experiment to find the actual breakpoints.

The saving probably wouldn't be hugely significant unless your data values are really big. It probably wouldn't be useful unless all your lookups in one are derived from the other--Ie. If you need to look up keys input from an external source, you would have lost the benefits of the hash, by needing to first map the actual key to it's reference value before being able to look it up!. It also makes your algorithms considerably more complicated having to de-reference all the time.

Depending on what properties of the hashes you are using, it is possible to create fast(ish) lookup tables that use less storage. You could look at Tie::SubstrHash and/or A (memory) poor man's <strike>hash</strike> lookup table..

Update: Storing the same two hashes as took 90MB above as Tie::SubstrHashs (keysize=8, Valuesize=8, tablesize=300,000), reduces the storage requirements to around 11 MB, at the expense of slowing the look ups and (the worse part) having to pad your keys and values to the specified maximums.

The latter part could be alleviated by writing your own Tiedhash module (or modifying Tie::SubstrHash) to perform the padding interally. I did this once, but the machine on which I did it is currently dead. It isn't too hard though. It greatly simplifies the use of the module. Maybe that would be a worthy patch to write?

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Comment on Re: two hashes occupying the same space