This is probably more of a response than you expected (or wanted :), but sometimes an innocent response triggers the coalescence of long deferred thought processes. This is one such occasion. I apologise in advance and offer you the option of skipping the rest of this post.

I've found that if you architect your application such that the threads module is only imported where needed, it all works out nicely. (Does a lot of data-copying on each thread initialisation.)

Yes. There are two features that could be added to the current implementation of threads that would, IMO, revolutionise their ease of use:

  1. threads->reallyNew( \&code, ... );.

    This would start a new interpreter, do no cloning beyond those internal things necessary to make the interpreter work, and would simply run the coderef I provide. No closures. No cloned uses. Do nothing except what I choose to do explicitly.

    50% of the weight problems and 90% of the startup delays could be avoided by this measure--and it would make programming threads much easier.

    I wouldn't have to try and guess what might or might not be cloned into my threads, nor take extra-ordinary steps to arrange my code layout to try and avoid stuff getting cloned that I don't need or want in my threads.

    This single factor, the 'lets pretend we are fork and clone everything in the current thread', is what makes it nearly impossible to encapsulate threads in modules.

    If I write a class that creates a thread at some point, at either the class or instance level. I cannot know when the user will load the module--before or after they already included the rest of CPAN in their application--so I cannot control how heavy the thread I start, will be. Nor control the cloning of things--like DB handles--that should not be cloned.

    And if the thread is created on a per instance basis, each new instance will be different, and heavier than previous because it will inherit anything that has been accumulated by the starting thread in the interim.

    This is a nonsense.

  2. A threads::shared::share() that does what it says on the tin--ie. shares what I ask it to share.

    The current design

    • that will take a named aggregate object, but not an anonymous one (by-passable).
    • And silently empties that object before sharing the base reference.

    is so broken as to be ridiculous. How anyone ever thought that silently discarding user data was a good idea beggars belief.

    This decision single-handedly makes using shared compound data structures so difficult that it creates the need for hacks like Thread::Queue::Any that use Storable to freeze and thaw structures for exchange.

    It is possible to create your own recursive share() using something like Data::Rmap, but getting it right is distinctly non-trivial. Trying to debug it when it goes wrong requires deep understanding of the internals--well beyond my superficial understanding. It requires the use of debug builds of perl and access to, and the knowledge of how to use, all the tools like gdb and purify and others that the internals guys use.

    This is exactly the type of code that should live in the core and be maintained in conjunction with the core.

I've been banging on about both for years, and I've attempted to garner understanding and support for these changes, but to no avail. I had high hopes when the cpan version of threads was born..

I even send complex objects (after having structured the objects such that re-bless and $obj->init($DATA) works in unit tests).

Personally, I do not think that using the re-blessing trick, nor proxy-object modules, is a good idea. Hence my reluctance to use or advocate them. In essence, I do not believe that sharing objects across threads is either a good design decision, nor ever really necessary.

Trans-thread object instances create the need for, and all the potential pitfalls and caveats of inter-thread synchronisation. Threads should only synchronise at the end of their lives; or via well-defined, mono-directional, asynchronous message passing mechanisms. Eg. Queues.

All other class-based thread usage should be contained within an object, not extra to it. Ie. If an object instance needs to do something slow or blocking, it should pass just that part of its state to a thread, dedicated to performing that operation. It then checks for, and if available, retrieves that result the next time the code that owns the object attempts to access the object state derived by that operation.

At that point, there are two possible scenarios.

  1. The slow or blocking operation has completed.

    The object will just return the results to the owner.

  2. The results are not yet available.

    Here there are three possible requirements:

    1. The owner needs the results at this point in the flow of the calling code and is prepared to wait for them.

      The owner calls a blocking operation to retrieve it.

    2. The owner has other things he can get on with pending the availability of the results.

      The object returns some value to indicate that the results are not yet available (say undef), and the owner can decide when to retry.

    3. The owner is happy to be given the last good value of the affected state and continue with that.

      The object returns the last good value immediately. The owner will get the new value of the affected state the first time he calls for it, after the slow or blocking operation completes.

That is not a good description, but is probably the best I have written in the 3 years I've been thinking about it. The point, is to move the code to handle concurrency inside the object where it can be written once and reused, but leave the 'do I block, poll or come back later' decision to the object owner.

The best example of this interface is Thread::Queue. With it's provision of ->dequeue(), ->dequeue_nb() and ->pending() methods, it addresses each of the scenarios above, giving its owners complete control, without ever requiring them to think about locking.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^5: Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?) by BrowserUk
in thread Parallel::ForkManager and vars from the parent script - are they accessible? (or do I need inter process communication?) by isync

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.