in reply to Re^4: Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?)
in thread Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?)
This is probably more of a response than you expected (or wanted :), but sometimes an innocent response triggers the coalescence of long deferred thought processes. This is one such occasion. I apologise in advance and offer you the option of skipping the rest of this post.
I've found that if you architect your application such that the threads module is only imported where needed, it all works out nicely. (Does a lot of data-copying on each thread initialisation.)
Yes. There are two features that could be added to the current implementation of threads that would, IMO, revolutionise their ease of use:
This would start a new interpreter, do no cloning beyond those internal things necessary to make the interpreter work, and would simply run the coderef I provide. No closures. No cloned uses. Do nothing except what I choose to do explicitly.
50% of the weight problems and 90% of the startup delays could be avoided by this measure--and it would make programming threads much easier.
I wouldn't have to try and guess what might or might not be cloned into my threads, nor take extra-ordinary steps to arrange my code layout to try and avoid stuff getting cloned that I don't need or want in my threads.
This single factor, the 'lets pretend we are fork and clone everything in the current thread', is what makes it nearly impossible to encapsulate threads in modules.
If I write a class that creates a thread at some point, at either the class or instance level. I cannot know when the user will load the module--before or after they already included the rest of CPAN in their application--so I cannot control how heavy the thread I start, will be. Nor control the cloning of things--like DB handles--that should not be cloned.
And if the thread is created on a per instance basis, each new instance will be different, and heavier than previous because it will inherit anything that has been accumulated by the starting thread in the interim.
This is a nonsense.
The current design
is so broken as to be ridiculous. How anyone ever thought that silently discarding user data was a good idea beggars belief.
This decision single-handedly makes using shared compound data structures so difficult that it creates the need for hacks like Thread::Queue::Any that use Storable to freeze and thaw structures for exchange.
It is possible to create your own recursive share() using something like Data::Rmap, but getting it right is distinctly non-trivial. Trying to debug it when it goes wrong requires deep understanding of the internals--well beyond my superficial understanding. It requires the use of debug builds of perl and access to, and the knowledge of how to use, all the tools like gdb and purify and others that the internals guys use.
This is exactly the type of code that should live in the core and be maintained in conjunction with the core.
I've been banging on about both for years, and I've attempted to garner understanding and support for these changes, but to no avail. I had high hopes when the cpan version of threads was born..
I even send complex objects (after having structured the objects such that re-bless and $obj->init($DATA) works in unit tests).
Personally, I do not think that using the re-blessing trick, nor proxy-object modules, is a good idea. Hence my reluctance to use or advocate them. In essence, I do not believe that sharing objects across threads is either a good design decision, nor ever really necessary.
Trans-thread object instances create the need for, and all the potential pitfalls and caveats of inter-thread synchronisation. Threads should only synchronise at the end of their lives; or via well-defined, mono-directional, asynchronous message passing mechanisms. Eg. Queues.
All other class-based thread usage should be contained within an object, not extra to it. Ie. If an object instance needs to do something slow or blocking, it should pass just that part of its state to a thread, dedicated to performing that operation. It then checks for, and if available, retrieves that result the next time the code that owns the object attempts to access the object state derived by that operation.
At that point, there are two possible scenarios.
The object will just return the results to the owner.
Here there are three possible requirements:
The owner calls a blocking operation to retrieve it.
The object returns some value to indicate that the results are not yet available (say undef), and the owner can decide when to retry.
The object returns the last good value immediately. The owner will get the new value of the affected state the first time he calls for it, after the slow or blocking operation completes.
That is not a good description, but is probably the best I have written in the 3 years I've been thinking about it. The point, is to move the code to handle concurrency inside the object where it can be written once and reused, but leave the 'do I block, poll or come back later' decision to the object owner.
The best example of this interface is Thread::Queue. With it's provision of ->dequeue(), ->dequeue_nb() and ->pending() methods, it addresses each of the scenarios above, giving its owners complete control, without ever requiring them to think about locking.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?)
by erroneousBollock (Curate) on Aug 10, 2007 at 08:32 UTC |