Re^5: Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?)

This is probably more of a response than you expected (or wanted :), but sometimes an innocent response triggers the coalescence of long deferred thought processes. This is one such occasion. I apologise in advance and offer you the option of skipping the rest of this post.

I've found that if you architect your application such that the threads module is only imported where needed, it all works out nicely. (Does a lot of data-copying on each thread initialisation.)

Yes. There are two features that could be added to the current implementation of threads that would, IMO, revolutionise their ease of use:

threads->reallyNew( \&code, ... );.
This would start a new interpreter, do no cloning beyond those internal things necessary to make the interpreter work, and would simply run the coderef I provide. No closures. No cloned uses. Do nothing except what I choose to do explicitly.
50% of the weight problems and 90% of the startup delays could be avoided by this measure--and it would make programming threads much easier.
I wouldn't have to try and guess what might or might not be cloned into my threads, nor take extra-ordinary steps to arrange my code layout to try and avoid stuff getting cloned that I don't need or want in my threads.
This single factor, the 'lets pretend we are fork and clone everything in the current thread', is what makes it nearly impossible to encapsulate threads in modules.
If I write a class that creates a thread at some point, at either the class or instance level. I cannot know when the user will load the module--before or after they already included the rest of CPAN in their application--so I cannot control how heavy the thread I start, will be. Nor control the cloning of things--like DB handles--that should not be cloned.
And if the thread is created on a per instance basis, each new instance will be different, and heavier than previous because it will inherit anything that has been accumulated by the starting thread in the interim.
This is a nonsense.
A threads::shared::share() that does what it says on the tin--ie. shares what I ask it to share.
The current design
- that will take a named aggregate object, but not an anonymous one (by-passable).
- And silently empties that object before sharing the base reference.
is so broken as to be ridiculous. How anyone ever thought that silently discarding user data was a good idea beggars belief.
This decision single-handedly makes using shared compound data structures so difficult that it creates the need for hacks like Thread::Queue::Any that use Storable to freeze and thaw structures for exchange.
It is possible to create your own recursive share() using something like Data::Rmap, but getting it right is distinctly non-trivial. Trying to debug it when it goes wrong requires deep understanding of the internals--well beyond my superficial understanding. It requires the use of debug builds of perl and access to, and the knowledge of how to use, all the tools like gdb and purify and others that the internals guys use.
This is exactly the type of code that should live in the core and be maintained in conjunction with the core.

I've been banging on about both for years, and I've attempted to garner understanding and support for these changes, but to no avail. I had high hopes when the cpan version of threads was born..

I even send complex objects (after having structured the objects such that re-bless and $obj->init($DATA) works in unit tests).

Personally, I do not think that using the re-blessing trick, nor proxy-object modules, is a good idea. Hence my reluctance to use or advocate them. In essence, I do not believe that sharing objects across threads is either a good design decision, nor ever really necessary.

Trans-thread object instances create the need for, and all the potential pitfalls and caveats of inter-thread synchronisation. Threads should only synchronise at the end of their lives; or via well-defined, mono-directional, asynchronous message passing mechanisms. Eg. Queues.

All other class-based thread usage should be contained within an object, not extra to it. Ie. If an object instance needs to do something slow or blocking, it should pass just that part of its state to a thread, dedicated to performing that operation. It then checks for, and if available, retrieves that result the next time the code that owns the object attempts to access the object state derived by that operation.

At that point, there are two possible scenarios.

The slow or blocking operation has completed.
The object will just return the results to the owner.
The results are not yet available.
Here there are three possible requirements:
1. The owner needs the results at this point in the flow of the calling code and is prepared to wait for them.
  The owner calls a blocking operation to retrieve it.
2. The owner has other things he can get on with pending the availability of the results.
  The object returns some value to indicate that the results are not yet available (say undef), and the owner can decide when to retry.
3. The owner is happy to be given the last good value of the affected state and continue with that.
  The object returns the last good value immediately. The owner will get the new value of the affected state the first time he calls for it, after the slow or blocking operation completes.

That is not a good description, but is probably the best I have written in the 3 years I've been thinking about it. The point, is to move the code to handle concurrency inside the object where it can be written once and reused, but leave the 'do I block, poll or come back later' decision to the object owner.

The best example of this interface is Thread::Queue. With it's provision of ->dequeue(), ->dequeue_nb() and ->pending() methods, it addresses each of the scenarios above, giving its owners complete control, without ever requiring them to think about locking.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re^5: Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?) Select or Download Code

Replies are listed 'Best First'.
Re^6: Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?) by erroneousBollock (Curate) on Aug 10, 2007 at 08:32 UTC
threads->reallyNew( \&code, ... );. This would start a new interpreter, do no cloning beyond those internal things necessary to make the interpreter work, and would simply run the coderef I provide. No closures. No cloned uses. Do nothing except what I choose to do explicitly. I like that idea. I do however enjoy the syntax of using closures to pass data to new threads, so some intermediate api may also be desirable. Example: `my $iq = $control->input_queue; my $oq = $control->output_queue; threads->new(sub { ... // use $iq/$oq in here. ... });` [download] A threads::shared::share() that does what it says on the tin--ie. shares what I ask it to share. The current design * that will take a named aggregate object, but not an anonymous one (by-passable). * And silently empties that object before sharing the base reference. is so broken as to be ridiculous. How anyone ever thought that silently discarding user data was a good idea beggars belief. Amen, re disgarding initial value... is that not just bad use of tie() ? (or maybe I misunderstand the mechanism) In general, I find very few examples (in the kind of applications I deal with) where fast access to tons of shared data is really necessary, so at least for me, Storable doesn't really pose too many speed/correctness issues. I have to admit I feel dirty every time I touch a lock... where possible I like to design data-structures and APIs such that I don't have to reason too deeply about locks when I go to maintain them. Of course that imposes certain problems with speed/granularity/composability.... and is the reason I posted Where is concurrency going? Is Perl going there? The best example of this interface is Thread::Queue. With it's provision of ->dequeue(), ->dequeue_nb() and ->pending() methods, it addresses each of the scenarios above, giving its owners complete control, without ever requiring them to think about locking. Agreed... it really boils down to how easy it is to reason about imperative code. Robust message-passing simplifies design, and reduces the concurrency problem to: "How do I avoid a deadlock?" I think I'd be nice to have more advanced concurrency primitives available... but at this stage that's a pipe-dream... it's a really good idea to get the shared-memory stuff into good shape and into the core. If it's not too much of a problem, I'd also like to be able to switch the method by which data is shared as a pragma (eg: pthread, shmem, mmap, etc)... -David	[reply] [d/l]

Replies are listed 'Best First'.

Re^6: Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?)
by erroneousBollock (Curate) on Aug 10, 2007 at 08:32 UTC

threads->reallyNew( \&code, ... );.
This would start a new interpreter, do no cloning beyond those internal things necessary to make the interpreter work, and would simply run the coderef I provide. No closures. No cloned uses. Do nothing except what I choose to do explicitly.

Example:

  my $iq = $control->input_queue;
  my $oq = $control->output_queue;
  threads->new(sub {
    ...
    // use $iq/$oq in here.
    ...
  });
[download]

A threads::shared::share() that does what it says on the tin--ie. shares what I ask it to share. The current design
* that will take a named aggregate object, but not an anonymous one (by-passable).
* And silently empties that object before sharing the base reference.

is so broken as to be ridiculous. How anyone ever thought that silently discarding user data was a good idea beggars belief.

In general, I find very few examples (in the kind of applications I deal with) where fast access to tons of shared data is really necessary, so at least for me, Storable doesn't really pose too many speed/correctness issues.

I have to admit I feel dirty every time I touch a lock... where possible I like to design data-structures and APIs such that I don't have to reason too deeply about locks when I go to maintain them. Of course that imposes certain problems with speed/granularity/composability.... and is the reason I posted Where is concurrency going? Is Perl going there?

The best example of this interface is Thread::Queue. With it's provision of ->dequeue(), ->dequeue_nb() and ->pending() methods, it addresses each of the scenarios above, giving its owners complete control, without ever requiring them to think about locking.

I think I'd be nice to have more advanced concurrency primitives available... but at this stage that's a pipe-dream... it's a really good idea to get the shared-memory stuff into good shape and into the core.

If it's not too much of a problem, I'd also like to be able to switch the method by which data is shared as a pragma (eg: pthread, shmem, mmap, etc)...

-David

[reply]
[d/l]