Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

shared complex scalars between threads

by Anonymous Monk
on Oct 15, 2008 at 20:52 UTC ( #717341=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am writing a multithreaded Perl script and I want all threads to have access to an instance of the Algorithm::NaiveBayes classifier.

So my code goes like this:
my $nb : shared; $nb = Algorithm::NaiveBayes->new();
and then $nb gets passed to the subroutine being run by each thread. However, at run time I getting the error about value not valid for shared scalar and it points to the line where I initialize $nb. I read on other threads that the shared model of Perl can only make shareable simple objects like hash and list, and can only go down one level deep if these are nested.

So, in light of this restriction is it possible to have several threads somehow share an instance of NaiveBayes? The reason I don't want each thread having separate private copy of $nb is that the trained model is quite large (~1GB) so it's a waste of memory to replicate it N times. The model is only accessed for classification and no changes are made to it (read-only) by any of the threads. So it seems like prime candidate for sharing. But I can't get it to work. Any thoughts/suggestions?

Thanks in advance.

Replies are listed 'Best First'.
Re: shared complex scalars between threads
by BrowserUk (Patriarch) on Oct 15, 2008 at 23:45 UTC

    If you don't already have them, upgrade your threads and threads::shared modules to teh latest cpan versions. recent version have the ability to share objects:

    #! perl -slw use strict; use threads; use threads::shared; use Junk; print $threads::VERSION; print $threads::shared::VERSION; sub thread { my $obj = shift; my $tid = threads->self->tid; sleep $tid; $obj->add( $tid, time() ); $obj->dump; return; } my $obj = Junk->new( abc => 123, pqr => 456 ); my $shrObj :shared = shared_clone( $obj ); my @threads = map threads->create( \&thread, $shrObj ), 1 .. 10; sleep 11; $shrObj->dump; $_->join for @threads; __END__ c:\test>junk5 1.71 1.26 bless({ # tied threads::shared::tie 1 => 1224113391, abc => 123, pqr => 456, }, "Junk") bless({ # tied threads::shared::tie 1 => 1224113391, 2 => 1224113392, abc => 123, pqr => 456, }, "Junk") ...

    Junk.pm

    package Junk; use Data::Dump qw[ pp ]; sub new { my $class = shift; return bless { @_ }, $class; } sub dump { my $self = shift; pp $self; } sub add{ my( $self, $key, $value ) = @_; $self->{ $key } = $value; return; } 1;

    I haven't done much with this ability, and I don't know how effective it is at sharing more complex objects, but it is worth a try. From a quick scan of teh POD and inside the module I don't get how it works so I can't even make a prediction.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I had a poke at shared objects a little while ago.

      You can share an arbitrarily complicated structure, with refs to arrays, hashes and scalars, to any depth you like. The tricky bit, I found, was that with refs to shared arrays and hashes you have to create an empty anonymous array/hash, mark it shared and then populate it.

      With objects the problem was that this meant that the object maker had to know to construct a shared object.

      As you say, the late model threads::shared claims:

      shared_clone REF

      shared_clone takes a reference, and returns a shared version of its argument, preforming (sic) a deep copy ...

      so, an object can be made shared after the event, now. But of the class adds new hashes or arrays to the object, without knowing they too need to be made shared, "fun" will ensue -- sure as eggs is eggs and this is not a pipe (also, no spoon). Not to mention the small issue of managing shared access to the object components.

      I'd be interested to hear how that goes !

        I'd be interested to hear how that goes !

        Me too. I have, (and have expressed here before), misgivings about this feature, but it is too late in the day my time, and I am too wrapped up in my own code right now to fathom enough about the OPs problem and requirements to be able to construct a meaningful test. He is going to have to do that for himself.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks, I'll try that and let you guys know if it worked. If this works then I think it's preferable to having the complex object run in a separate process and exposed via an API since this would in effect serialize the access to it when several threads make a call to the API and request access to that object. I am trying to increase concurrency as much as possible without creating multiple copies of that large Bayesian model. Let's hope shared_clone works...
        I am trying to increase concurrency as much as possible without creating multiple copies of that large Bayesian model.

        Unfortunately, I don't think it will. I got to do a little more experimenting this morning and it seems that creating a shared object this way, (using threads::shared::bless()), simply replicates everything it contains for each thread that gets a handle ;( Sorry if I've wasted your time on this, but I thought that they were doing something clever, but it seems that is not the case. I cannot believe anyone thought this was a useful idea.

        The next possibility is to create the big object in a single thread and then have your other threads use it client-server fashion, passing messages detailing the request, and waiting for the reply. This could be done through a queue or individual shared scalars (or even sockets), but whichever way, the requests would effectively be serialised. If the requests are fairly long running and you are hoping to run multiple concurrent requests on different cpus, it isn't going to happen.

        I do not currently have a solution to offer you. One possibility involves modifying the internals of Algorithm::NaiveBayes to separate the bulk of the data (currently stored as a complex attribute of the object itself), from the rest of the objects internals and then attempt to have multiple instance of the object share a single shared copy of that bulk of data without cloning. I have a few ideas on this, but nothing that ready for prime time.

        Once again. Sorry for any time you have wasted through the bum steer. I should have gone with my gut and stuck with my original misgivings about this feature.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: shared complex scalars between threads
by Illuminatus (Curate) on Oct 15, 2008 at 21:30 UTC
    One mechanism would be to create the object in a single thread, and then provide an API to it. Create a shared-data area to pass request/response data. Use semaphores to protect the shared-data, and to have the Bayes-thread wait on.
    1. acquire shared-data semaphore
    2. fill in shared-data request
    3. hit Bayes-thread semaphore
    4. Bayes-thread reads request, posts response
    5. requester reads response and frees shared-data semaphore
Re: shared complex scalars between threads
by aufflick (Deacon) on Oct 15, 2008 at 23:50 UTC
    The two above give good suggestions. Your problem at the moment, I think, is that you are only sharing that variable - any deep structure is not shared. See this from threads::shared:

    "share" will traverse up references exactly one level. "share(\$a)" is equivalent to "share($a)", while "share(\\$ +a)" is not. This means that you must create nested shared data st +ructures by first creating individual shared leaf notes, then adding + them to a shared hash or array.

    I haven't tried it, but as BrowserUK points out the newer versions of threads::shared claim to do what you want.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://717341]
Approved by Narveson
Front-paged by Narveson
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2023-12-06 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (30 votes). Check out past polls.

    Notices?