http://qs1969.pair.com?node_id=1218965

sezal has asked for the wisdom of the Perl Monks concerning the following question:

Perl internally uses dedicated hash PL_strtab as shared storage for hash's keys, but in fork environment like apache/mod_perl this creates a big issue. Best practice says to preload modules in parent process, but nobody says it's eventually allocates memory for PL_strtab and these pages of memory tend to be implicitly modified in child processes. There are seems to be 2 major reasons of modification:

Reason 1: reallocation (hsplit()) may happen when PL_strtab growths in child process.

Reason 2: REFCNT every time new reference created.

Example below shows 16MB copy-on-write leak in attempt to use hash. Attempts to recompile perl with -DNODEFAULT_SHAREKEYS fails (https://rt.perl.org/SelfService/Display.html?id=133384). I was able to get access to PL_strtab via XS module and ideally I'm looking for a way to downgrade all hashes created in parent to keep hash keys within a hash (HE object) rather than PL_strtab, i.e. turn off SHAREKEYS flag. This should allow to shrink PL_strtab to minimum possible size. Ideally it should have 0 keys in parent. Please let me know you think it's theoretically possible via XS.

#!/usr/bin/env perl use strict; use warnings; use Linux::Smaps; $SIG{CHLD} = sub { waitpid(-1, 1) }; # comment this block { my %h; # pre-growth PL_strtab hash, kind of: keys %$PL_strtab = 2_000_000 +; foreach my $x (1 .. 2_000_000) { $h{$x} = undef; } } my $pid = fork // die "Cannot fork: $!"; unless ($pid) { # child my $s = Linux::Smaps->new($$)->all; my $before = $s->shared_clean + $s->shared_dirty; { my %h; foreach my $x (1 .. 2_000_000) { $h{$x} = undef; } } my $s2 = Linux::Smaps->new($$)->all; my $after = $s2->shared_clean + $s2->shared_dirty; warn 'COPY-ON-WRITE: ' . ($before - $after) . ' KB'; exit 0; } sleep 1000; print "DONE\n";

Replies are listed 'Best First'.
Re: PL_strtab/SHAREKEYS and copy-on-write leak
by dave_the_m (Monsignor) on Jul 20, 2018 at 21:59 UTC
    NODEFAULT_SHAREKEYS as a build option is undocumented, untested, and has probably suffered much bitrot over the years.

    Note that if you disable the shared key hash, then every object will use its own storage for every hash key. For example if a process creates 1000 objects, each being a hash with the same fixed set of keys (but differing values of course), then that process will store 1000 copies of each key.

    I doubt that you can safely and reliably shrink PL_strtab via XS.

    Dave.

      Note that if you disable the shared key hash, then every object will use its own storage for every hash key. For example if a process creates 1000 objects, each being a hash with the same fixed set of keys (but differing values of course), then that process will store 1000 copies of each key.

      But on the other hand, using shareable keys on data hashes where sharing is unlikely gives you overhead. If Devel::Size is correct then sharing a key adds 24 bytes.

      Maybe it would be nice if it was possible to decide per-hash whether to share keys or not. Or even nicer, if perl objects wouldn't be based on hashes, then key sharing wouldn't be necessary at all.

        The plan at some point is to introduce hash vtables. This will allow hashes to have varying implementations, which then opens up such possibilities.

        Dave.

      Dave,

      Attempt to disable DEFAULT_SHAREKEYS was no more then just attempt to proof my assumptions regrading PL_strtab. I understand its not a good idea for production.

      I also not sure shrinking of PL_strtab (even it it's possible) will help. Perl may not release memory to OS, thus copy-on-write will occur.

      From what I see in sv.c and util.c Perl doesn't provide any API to dynamically enable/disable DEFAULT_SHAREKEYS for new hashes. :(

      Thank you.
Re: PL_strtab/SHAREKEYS and copy-on-write leak
by haukex (Archbishop) on Jul 20, 2018 at 22:38 UTC

    Crossposted to StackOverflow. Crossposting is acceptable, but it is considered polite to inform about it so that efforts are not duplicated.

      Sorry for that. This is my first post, will keep in mind next time.
Re: PL_strtab/SHAREKEYS and copy-on-write leak
by eserte (Deacon) on Aug 16, 2018 at 13:33 UTC
    Here's a gist with a small XS module providing a function to turn off hash key sharing per hash. Maybe this will help if you have mostly data-like hashes (unlike hashes for objects), which you have "under control" and can be marked as non-shareable before filling them.
A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.