I don't know if you have found what you were looking for or not, but here is another way to win a little space: hashes use the stringified value of keys, so most 64 bits ints will be written with 18 bytes or more*. But if you pack them before, it will only use 8 bytes.
$i=1; $H{ 2**52 + $i++ } = $_, $G{ pack "Q", 2**52 + $i++ } = $_ for "
+aaaa".."zzzz"; say total_size(\%H); say total_size(\%G)
__DATA__
63601240
59945432
This shoudln't change much for
values, though it would avoid the scalars from having both a string and int value.
On my computer:
perl -E '$a = 2**53+1; $b = 2**53+2; say "Hello" if $a != $b and $a eq
+ $b; say "End"'
Hello
End
perl -Mbignum -E '$a = 2**53+1; $b = 2**53+2; say "Hello" if $a != $b
+and $a eq $b; say "End"'
End
(numbers past a certain point are stringified to the same thing)
Think: typical variable names!
if that's \w+, that's 63 possible values per byte. So there probably is a possibility of compression here as well, even maybe huffman coding. But if most of those strings are only up to 8 bytes long, I'm not sure there's going to be an actual improvement (though in this case, this will save space both in the keys and the values).
* Numbers below 2^27 may have less than 4 digits in base 10, but that's one number every 2^37. And numbers above 2^30 are all shorters once packed than stringified, and there's 2^34 more numbers above that point than below.