Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

do separate hashes share common keys?

by LanX (Sage)
on Feb 13, 2021 at 22:38 UTC ( #11128348=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

Question: Do different hashes optimize space by sharing duplicated strings for keys?

I did some RAM benchmarking by generating many small hashes and arrays and was irritated that the hashes only needed <30% more memory than the arrays holding the values.

Then I realized that all hashes were generated with the same 5 keys.

When I created the hashes with random keys of same length, the need for memory doubled.

Like for 100_000 hashes with 5 entries

  • ~128MB for different keys
  • ~60MB for same keys
Devel::Peek shows a flag SHAREKEYS

DB<424> Dump \%h SV = IV(0x3f1b368) at 0x3f1b378 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x3452878 SV = PVHV(0x333d148) at 0x3452878 REFCNT = 2 FLAGS = (OBJECT,OOK,SHAREKEYS) STASH = 0x31619a8 "TST" ...

these are test-samples of the generated data

\@ARGV: ["test", "-n_objs=3", "-n_attr=5", "-share_keys=0"] at d:/tmp/ +pm/bless_hash_overhead.pl line 21. \%opt: { -n_attr => 5, -n_objs => 3, -share_keys => 0 } at d:/tmp/pm/b +less_hash_overhead.pl line 22. \@types: ["test"] at d:/tmp/pm/bless_hash_overhead.pl line 23. Running Test Demo at d:/tmp/pm/bless_hash_overhead.pl line 53. Id : 14940 PM : 2945024 WS : 7933952 *** hash *** [ { "_0_852324.26186940" => 1, "_1_164918.79415432" => 1, "_2_708618.00181786" => 1, "_3_693984.59335475" => 1, "_4_318308.30209681" => 1, "_5_354558.95958874" => 1, "id" => 1, }, { "_0_941653.32000406" => 2, "_1_026454.70369051" => 2, "_2_708950.10790847" => 2, "_3_858794.53324083" => 2, "_4_112273.33593869" => 2, "_5_135182.85796952" => 2, "id" => 2, }, { "_0_926854.36297414" => 3, "_1_445275.68350328" => 3, "_2_111781.50175010" => 3, "_3_326689.53909899" => 3, "_4_069944.31695859" => 3, "_5_650963.32051237" => 3, "id" => 3, }, ] at d:/tmp/pm/bless_hash_overhead.pl line 56.
C:/Perl_524/bin\perl.exe -w d:/tmp/pm/bless_hash_overhead.pl test -n_o +bjs=3 -n_attr=5 -share_keys=1 \@ARGV: ["test", "-n_objs=3", "-n_attr=5", "-share_keys=1"] at d:/tmp/ +pm/bless_hash_overhead.pl line 21. \%opt: { -n_attr => 5, -n_objs => 3, -share_keys => 1 } at d:/tmp/pm/b +less_hash_overhead.pl line 22. \@types: ["test"] at d:/tmp/pm/bless_hash_overhead.pl line 23. Running Test Demo at d:/tmp/pm/bless_hash_overhead.pl line 53. Id : 14752 PM : 2924544 WS : 7909376 *** hash *** [ { _0_xxxxxxxxxxxxxxx => 1, _1_xxxxxxxxxxxxxxx => 1, _2_xxxxxxxxxxxxxxx => 1, _3_xxxxxxxxxxxxxxx => 1, _4_xxxxxxxxxxxxxxx => 1, _5_xxxxxxxxxxxxxxx => 1, id => 1, }, { _0_xxxxxxxxxxxxxxx => 2, _1_xxxxxxxxxxxxxxx => 2, _2_xxxxxxxxxxxxxxx => 2, _3_xxxxxxxxxxxxxxx => 2, _4_xxxxxxxxxxxxxxx => 2, _5_xxxxxxxxxxxxxxx => 2, id => 2, }, { _0_xxxxxxxxxxxxxxx => 3, _1_xxxxxxxxxxxxxxx => 3, _2_xxxxxxxxxxxxxxx => 3, _3_xxxxxxxxxxxxxxx => 3, _4_xxxxxxxxxxxxxxx => 3, _5_xxxxxxxxxxxxxxx => 3, id => 3, }, ] at d:/tmp/pm/bless_hash_overhead.pl line 56.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re: do separate hashes share common keys?
by dave_the_m (Monsignor) on Feb 14, 2021 at 09:53 UTC
    Yes, each interpreter has what's called the shared string table, which is a big hash which uses every key currently in use by most hashes, using a reference counting mechanism. So if you create 1000 objects with the same key "foobar", there will be a single 'HEK' entry in the shared table for the string "foobar", and all the individual hashes will have a pointer to it.

    Note that it is (mostly) unrelated to the Copy-on-Write mechanism.

    Dave.

Re: do separate hashes share common keys?
by jcb (Parson) on Feb 14, 2021 at 03:08 UTC

    I am fairly sure that that is an optimization introduced in 5.8 to reduce memory usage with the common case of hashes used to implement objects. I believe that this optimization was part of the strategy for replacing the pseudo-hashes that were experimental in 5.6 and removed (if I recall correctly) in 5.10.

    Pseudo-hashes were an attempt at reducing the overhead for structures with named fields by allowing multiple arrays to share a hashref in position zero which mapped keys to array indices.

      Pseudo-hashes were replaced by fields , right?

      I need to include fields in my benchmarks, it's strange almost nobody seems to use them.

      Either the benefits are not big enough, or (rather) the documentation is too confusing.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Mark Jason Dominus gave a talk about the life and death of pseudohashes. (Id heard of them but had no idea what they actually were before seeing this video.)

        I infer from it that fields.pm originated as a convenient interface for pseudohashes and that after they went away fields.pm was kept around mainly for backwards compatibility and reworked to use restricted hashes.

        Id never played with fields.pm before, but I spent a few minutes with it just now and I can guess why its not very popular: it doesn't make much of an object. It doesnt auto-generate accessors or a constructor. (Code is pasted below.) The attempted method call $blurple->blue blows up because there's no blue method. Also, the copy of %params into %{$self} will blow up if you pass any parameter name other than "red", "green", or "blue".

        – Aaron
        Preliminary operational tests were inconclusive. (The damn thing blew up.)
Re: do separate hashes share common keys?
by swl (Vicar) on Feb 14, 2021 at 06:44 UTC
      thanks, but I'm confused about the "for example hash keys were already copy-on-write" comment.

      By principle hash-keys are (and must be) immutable.

      Immutable data can be shared w/o allocating new memory, while mutable objects need to be copied to separate space.

      COW is a performance trick to "lazily" delay that copy to the time when the mutation (the write) happens.

      So there must be a misunderstanding, since hash-keys are always read-only, there is never a write.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: do separate hashes share common keys?
by LanX (Sage) on Feb 14, 2021 at 19:59 UTC
    For those interested in the gory details, I found a way to display the HTML with images from illguts on metacpan

    https://st.aticpan.org/source/RURBAN/illguts-0.49/index.html

    0x20000000 SVphv_SHAREKEYS

    When this flag is set, then the hash will share the HEK structures with a special hash pointed to by the strtab variable. This reduce the storage occupied by hash keys, especially when we have lots of hashes with the same keys. The SHAREKEYS flag is on by default for newly created HVs.

    Since 5.10:strtab.image

    What is special with the strtab hash is that the val field of the HE structs is used as a reference counter for the HEK. The counter is incremented when new hashes link up this HEK and decremented when the key is removed from the hashes. When the reference count reach 0, the HEK (and corresponding HE) is removed from strtab and the storage is freed.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11128348]
Approved by Corion
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2022-01-20 16:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (57 votes). Check out past polls.

    Notices?