"Auto" vivifying shared data structures

Dr. Mu has asked for the wisdom of the Perl Monks concerning the following question:

Conclusion: johhnywang's suggestion to use freeze and thaw from the Storable module is what I've settled on. I'm not sure what the consequences will be in terms of processing overhead, but they can't be worse than recursively sharing every node in a data structure using pure Perl. I hope that the maintainer(s) of the threads::shared pragma will come to realize that users need more than the single-level hashes currently provided.

I'm writing an app using threads and threads::shared under ActivePerl 5.8.4. One of the major shortcomings of data structures shared among threads is that each element has to be shared explicitly, and inserting references to unshared data results in an error. In order to overcome this limitation, I've attempted to write a PutShare subroutine, which takes a reference to a data structure, a string of literal keys/subscripts, and a leaf value, and attempts to insert the requisite nodes in such a way that all elements are properly shared. For new nodes, whether to create a hash key or a subscript is determined by whether the literal is an integer or something else -- rather like the way Template Toolkit does it. Herewith is my (recursive) subroutine:

sub PutShare {
    print "Args: ", join(' | ', @_), "\n";
    my $shr = shift;
    my $shrref = ref $shr;
    while ($shrref eq 'REF') {
        print "  Dereferencing $shr ";
        $shr = $$shr;
        print "to get $shr\n";
        $shrref = ref $shr
    }
    if (@_ > 1) {
        my $key = shift;
        print "  Key = $key\n";
        if ($shrref eq 'ARRAY') {
            print "  Adding/using '$key' of existing array.\n";
            PutShare(\($shr->[int($key)]), @_)
        } elsif ($shrref eq 'HASH') {
            print "  Adding/using '$key' of existing hash.\n";
            PutShare(\$shr->{$key}, @_)
        } elsif ($key =~/^\s[-+]?\d+/) {
            my $new = &share([]);
            print "  Adding '$key' to new array.\n";
            PutShare(\$new->[int($key)], @_);
            $$shr = $new
        } else {
            my $new = &share({});
            print "  Adding '$key' to new hash: $new\n";
            PutShare(\$new->{$key}, @_);
            $$shr = $new
        }
    } else {
        my $val = shift;
        print "  Value = '$val'\n";
        my $valref = ref $val;
        if ($valref eq 'ARRAY') {
            my $new = &share([]);
            @$new = @$val;
            $$shr = $new
        } elsif ($valref eq 'HASH') {
            my $new = &share({});
            %$new = %$val;
            $$shr = $new
        } else {        
            $$shr = $val;
        }
        print "\n";
    }
}
[download]

That this subroutine is flawed should be obvious from the get-go: that dereferencing loop shouldn't have to be there. So I've blundered in the way I add new nodes. But the fact is that it works the way I want, when it's not being used on a shared hash.

Here's the sample code I used to test it:

use strict;
use warnings;
use threads;
use threads::shared;
use Data::Dump qw/dump/;

my %User;
PutShare(\%User, qw/deep fish tuna/);
PutShare(\%User, qw/deep deeper fish halibut/);
print "The resulting leaves are: '$User{deep}{fish}' and '$User{deep}{
+deeper}{fish}'\n";
print "\nDump: ", dump(\%User);
[download]

If I comment out the use threads (turning share into a nop), the output I get is this:

Args: HASH(0x18241f8) | deep | fish | tuna
  Key = deep
  Adding/using 'deep' of existing hash.
Args: SCALAR(0x224fa4) | fish | tuna
  Key = fish
  Adding 'fish' to new hash: HASH(0x1890b98)
Args: SCALAR(0x1890ba4) | tuna
  Value = 'tuna'

Args: HASH(0x18241f8) | deep | deeper | fish | halibut
  Key = deep
  Adding/using 'deep' of existing hash.
Args: REF(0x224fa4) | deeper | fish | halibut
  Dereferencing REF(0x224fa4) to get HASH(0x1890b98)
  Key = deeper
  Adding/using 'deeper' of existing hash.
Args: SCALAR(0x1890bb0) | fish | halibut
  Key = fish
  Adding 'fish' to new hash: HASH(0x18af2c0)
Args: SCALAR(0x18af2cc) | halibut
  Value = 'halibut'

The resulting leaves are: 'tuna' and 'halibut'

Dump: { deep => { deeper => { fish => "halibut" }, fish => "tuna" } }
[download]

But with the use threads active, I get essentially the same output except for the dump:

Dump: { deep => { deeper => { fish => '#LVALUE#' }, fish => "tuna" } }
[download]

And dump complains about the lvalue.

Anyway, I'm not there yet. Perhaps the Monks can see some detail I've overlooked.

As an aside, I'm quite annoyed that this is even necessary. threads::shared should have provided autovivification built-in. Why make every user try to do it on his/her own?

Comment on "Auto" vivifying shared data structures Select or Download Code

Replies are listed 'Best First'.
Re: "Auto" vivifying shared data structures by jdhedden (Deacon) on Nov 17, 2005 at 20:53 UTC
Here's what I wrote for Object-InsideOut for creating a shared copy of a complex data structure: # Make a copy of a complex data structure that is thread-shared. # If not thread sharing, then make a 'regular' copy. sub shared_copy { my $in = $_[0]; # Make copies of array, hash and scalar refs if (my $ref_type = ref($in)) { # Copy an array ref if ($ref_type eq 'ARRAY') { # Make empty shared array ref my $out = ($threads::shared::threads_shared) ? &threads::shared::share([]) : []; # Recursively copy and add contents for my $val (@$in) { push(@$out, shared_copy($val)); } return ($out); } # Copy a hash ref if ($ref_type eq 'HASH') { # Make empty shared hash ref my $out = ($threads::shared::threads_shared) ? &threads::shared::share({}) : {}; # Recursively copy and add contents while (my ($key, $val) = each(%$in)) { $out->{$key} = shared_copy($val); } return ($out); } # Copy a scalar ref if ($ref_type eq 'SCALAR') { if ($threads::shared::threads_shared) { return (threads::shared::share($in)); } # If not sharing, then make a copy of the scalar ref my $out = \do{ my $scalar; }; $$out = $$in; return ($out); } } # Just return anything else # NOTE: This will generate an error if we're thread-sharing, # and $in is not an ordinary scalar. return ($in); } [download] You would use it in your application by first creating a regular version of a data structure, and then running it through this code. The returned shared copy can then be added safely to whatever shared structure you're working with. If you still find that doesn't work, then you may be bumping up against problems with threads under ActivePerl 5.8.4. See Re^3: New Module Announcement: Object::InsideOut. Remember: There's always one more bug.	[reply] [d/l]
Re^2: "Auto" vivifying shared data structures by Dr. Mu (Hermit) on Nov 17, 2005 at 21:17 UTC
Well, you've read my mind, apparently. If I couldn't get my in situ routine to work, the next step was going to be to copy a node from the shared hash, do the insertion work on it, then copy it back -- making sure, in the process, that all the subnodes are shared. If I go that route, your routine will come in handy. Thanks!	[reply]
Re: "Auto" vivifying shared data structures by johnnywang (Priest) on Nov 17, 2005 at 20:53 UTC
I've encountered this same problem before: the fact everything has to be explicitly shared. This becomes a even bigger problem if the data structure is produced by somebody else's code (such as XML::Simple). I've resorted to freez/thaw and share the serialized scalar.	[reply]
Re^2: "Auto" vivifying shared data structures by Dr. Mu (Hermit) on Nov 17, 2005 at 21:24 UTC
This, too, is an option I was considering. But I didn't know about freeze and thaw in the Storable package. (I was going to use Data::Dump and eval. Yecch!) Thanks for the pointer!	[reply]
Re: "Auto" vivifying shared data structures by dave_the_m (Monsignor) on Nov 17, 2005 at 22:01 UTC
threads::shared should have provided autovivification built-in. Why make every user try to do it on his/her own? Because shared structures are very expensive in terms of memory and execution time. Making it easy to have complex shared structures encourages bad habits. Dave.	[reply]
Re^2: "Auto" vivifying shared data structures by BrowserUk (Patriarch) on Nov 17, 2005 at 22:28 UTC
Just to reinforce the point, could you give some general information of how much extra space is required to share a structure. Eg. If a structure takes 10Kb as a non-shared entity, how much would it cost to share this with say 2 other threads (3 total)? Is it 30Kb, 40Kb, 50Kb extra? Also, if you change a leaf (scalar) within that structure, how much slower is that than in a non-shared equivalent? Finally, does the hit of updating the other copies of the modified value come all at once in the context of the thread that makes the modification? Or incrementally as the other sharing threads get a timeslice? Or does the sharing thread have to defer until each of the other threads get a timeslice in order to update their copies? If the latter is the case, does the time the modifying thread waits for the others threads to update depend on the vagaries of the scheduler calling those threads (in competition with all other threads in the system), before the modifying thread can continue and unlock the modified scalar? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: "Auto" vivifying shared data structures by dave_the_m (Monsignor) on Nov 17, 2005 at 22:58 UTC
Very roughly speaking, a shared variable is implemented internally using a form of tieing. Each thread has its own copy of the tied variable, and the real value is stored in a separate area of shared memory. When you read a variable, the tied code gets called (XS rather than perl code), which locks the real shared value, then copies its value into the thread's private address space. Similary a write to a shared variable involves copying the private value into the shared variable. So in a worst-case scenario: `my $s : shared = 'a' x 10_000; for (1..100) { async { my $l = length $s; sleep } }` [download] would consume about 1Mb, with each thread having a cached copy of the string at the point it was accessed. Nested structures are more complicated than that, but roughly speaking the worst-case isn't as bad as the scalar case above. Dave.	[reply] [d/l]
Re^2: "Auto" vivifying shared data structures by Dr. Mu (Hermit) on Nov 17, 2005 at 22:28 UTC
Nonetheless, a built-in facility or function to ease the process for the disciplined programmer would've been a nice touch. The contortions that we are forced to resort to in its absence betray an obvious deficit in the pragma's original vision. I can't agree that things that "encourage bad habits" are necessarily bad. Let's face it: Perl itself encourages bad habits -- as does any tool with similar power. It's up to the user to wield that power responsibly.	[reply]