BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Anyone (jdhedden?) understand why if a glob can be successfully cloned into a new thread, why (technically) it cannot be shared ("Cannot share globs yet") or assigned to a shared scalar ("Invalid value for shared scalar")?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re: Sharing globs (tuits)
by tye (Sage) on Oct 27, 2007 at 15:46 UTC

    Cloning requires copying. Sharing requires locking and triggers on any updates so the updates get applied to the other thread(s). Clearly, nobody has yet written the code to put all of the locking and triggers in place for globs.

    Now, there certainly could be more to it than that. But I can certainly understand writing the code to lock and trigger for scalars before writing the code to lock and trigger for globs (since that code for globs isn't of much use if you don't have the code for all of the items contained in a glob already anyway).

    - tye        

      Well yeah, but that doesn't help me work out what needs to be done to change it. I can see and disable all the checks that prevent it, but (unsurprisingly) the results segfault. Sometimes.

      So I'm trying to understand the technical reasons why it hasn't been done before now--not the social ones.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sharing globs
by renodino (Curate) on Oct 27, 2007 at 17:12 UTC
    (The following is mostly supposition, based on my current experience implementing Thread::Sociable)
    1. threads::shared is based on tie()s, which don't completely support handles yet (tho 5.10 looks promising)
    2. How to deal with multiple threads writing to or esp. closing a handle (more complicated than just refcounting)
    3. The fact that the handle needs to be instantiated in the shared interpretter: how does it get there ? perhaps a new open(), e.g., shared_open() ? Or predeclare a lexical handle variable as shared ?
    4. Maintaining the illusion of fork() on Windows
    I've vague notions about trying to hack something via PerlIO layers and routing thru shared (or rather, sociable) scalars to an I/O object, but need to better understand how the clone operation works. (and based on some recent p5p posts, it appears PerlIO isn't entirely thread-safe).

    However, the tie() approach w/ special open() to install things into the shared interpretter is looking more attractive to me (despite the beached whale issue aka the global shared interpretter lock), though its likely to require big gobs of XS/C.


    Perl Contrarian & SQL fanboy
Re: Sharing globs
by renodino (Curate) on Nov 02, 2007 at 00:49 UTC
    Serves me right for yakking before thinking...

    I just realized the Big Issue with installing the handle context in the shared interpretter, and once again its the beached whale.

    Since any access to a shared variable might cause interpretter state changes, and (presumably) accesses to a handle might cause even more state changes than just touching a scalar, the shared interpretter lock would have to be held for the duration of access to the handle...including during the I/O operations applied to said handle. Which means that every other thread that needs to do something to any shared variable - no matter how trivial - has to wait for some other thread's I/O operation to complete. Which might be a very long time...

    Which essentially means the handle can't be used from the shared interpretter (at least, not for anything more than bookkeeping purposes).

    Which leads me back to the PerlIO layer idea. If a shared scalar were associated with each handle, the handles might be treated as regular atoms (string/number literals) when passed between threads. The receiving thread would then need to instantiate the handle's context in its own private interpretter context. Which is just a fancier way of doing what we've already been doing, i.e., passing the fileno and re-opening in the recving thread. Of course, the semantics of file operations get a bit confused at that point: passing a handle from Thread A to Thread B, wherein Thread B does a seek and lets Thread A know its repositioned the file pointer, leaves Thread A stuck at the old file position, since Thread B has a distinctly new handle. Not an issue for stream handles, but block I/O can get confused.

    And then there's the need to collect all the other layer info to re-instantiate the handle, which won't exist until 5.10 (maybe 5.8.9 ?)

    So ideally, the resuscitation of the handle in the receiving thread would actually perform a clone operation, rather than a re-open. But then things get hairy wrt refcounts. (Thread B's handle reference goes out of scope, so the private interpretter invokes close() on it...and suddenly Thread A starts getting nasty errors when it tries to use the handle).

    In summary: its complicated.


    Perl Contrarian & SQL fanboy

      For the *{IO} portion of the glob, I would do exactly what we do now manually, dup() the IO handle and store it in the shared proxy. Whilst the underlying file/directory/socket/whatever would be shared, the internal state associated with it would be thread-specific.

      This is essentially identical to the situation when file/socket handles are shared between processes via fork. Each process can access the underlying entity, but has local internal state for things like file positions, directory positions etc. You're right about the confused semantics, but they are really no different than in the fork scenario.

      The problems I was hoping for enlightment on go much much deeper than this. For example, globs have associated glob magic. The way threads::shared works (now) is by adding it's own form of magic to the entity being shared. My initial thought was that there was a conflict between having two types of magic applied to a single entity, but I've discovered that this is not the case.

      Perl has long been able to have multiple types of magic applied to a single entity. See the moremagic field of the magic structure referenced from the MAGIC field of the SvPVMG. It allows for an arbitrary length chain of magics to be applied to any entity.

      To verify this, I went into threads::shared and disabled the checks that produce the "Cannot share globs yet" and loh & behold, I can now share a glob, dup the *{IO} portion (still done externally for now) and pass the shared glob to another thread and it works.

      This makes the implementation of threaded servers very much easier as there in no mucking around with fileno and holding onto copies of socket handles so that the socket isn't automatically closed--as the original goes out of scope--before the child thread has a chance to perform the dup(). So much simpler. Testing is limited so far, but it does work.

      But there is a further problem. HTTP::Daemon and probably others, hang objects off of the underlying socket glob produced by IO::Socket. That in itself is not a problem, you simply assign a reference to a shared hash to the *{HASH} slot at the same time as you assign a dup() of the socket to the *{IO} slot. And sure enough, everything (that I've tested so far) works.

      Where things go bellyup is that HTTP::Deamon also creates glob-based objects for the ClientConnection, also by using the *{HASH} slot--and that works also, except that is also stores a copy of the server globject in a hash value in the ClientConn globject. And once a shared glob has been stored in a hash value, it looses some or all of its magic.

      The underlying cause of that (as best as I can determine) seems to be that the magic handling code in threads::shared.xs only handles copying one additional level of magic (besides it's own), and shared element of shared aggregates require additional 'share magic'. That means that a shared glob stored in the element of a shared hash (or array) would need a chain of 3 types of magic, but shared.xs is only written to deal with 2 levels. And at that point things start failing in interesting ways.

      As I pointed out above, Perl's SvPVMGs are designed to handle an arbitrary length chain of magics, and so (I reason) there is no fundemental reason why this couldn't be made to work. It just needs the appropriate degree of understanding and xs skills and simple, thread-based servers would become a reality. Unfortunately, so far, my attempts to understand the application and management of magics has left me floundering, and those with the skills aren't interested enough in the use of threads to do the necessary.

      I'm at a loss as to how to take this further because the documentation of magics seems to be limited to a single paragraph in the perlguts docs and it isn't enough. As with all things XS, there does not seem to be a viable route forward in the acquisition of the required knowledge.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Nice piece of detective work!

        I'm troubled by this statement:

        And once a shared glob has been stored in a hash value, it looses some or all of its magic.

        Presumably, the assignment to the 'httpd_daemon' element in the shared *HASH is going thru the sharedsv_elem_mg_STORE() method, which should end up in sharedsv_scalar_store(). Have you tried sprinkling the latter with printf's to see what path it takes ? I'd assume its handled as an RV, but maybe something else is going on...( I have vague memories of a recent p5p posting regarding a threads::shared patch to address chained magic, but can't seem to find anything in the changelog...but make sure you're using the latest version)

        FWIW I'm personally still not comfortable with just dup()'ing; I've experienced too many weird behaviors with both block and (esp.) stream I/O when things get dup()'ed. I'm also concerned about the possibility of piling up dup's if an app has a master thread repeatedly/arbitrarily passing file handles around to worker threads. But if the sharing magic can be fixed, then that problem can presumably be fixed by stealing the clone code for handles, and then (a) checking if the file already has context in the recving thread and (b) cloning the state into the recving thread at that point if it doesn't.

        (BTW: maybe this dialogue needs to be moved to the ithreads maillist ?)


        Perl Contrarian & SQL fanboy