in reply to Passing globs between threads

This works fine for me.

#! perl -slw use strict; use IO::File; use threads; sub thread{ my $fh = shift; print for <$fh>; return; } my $io = new IO::File( 'junk', 'r' ) or die $!; my $thread = threads->create( \&thread, $io ); $thread->join; $io->close; __END__ P:\test>395373 This is junk and some more junk and yet more junk this is line 4 of junk

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^2: Passing globs between threads
by conrad (Beadle) on Sep 30, 2004 at 21:28 UTC
    Sorry, should have submitted example code, I wasn't being clear: I'm passing objects to the thread through shared variables after it's been created. This code demonstrates what I mean - note that it won't complete successfully as long as the *STDIN line remains uncommented. But I'm looking for the best way of making something like that work!

    #!/usr/bin/perl -w use strict; use threads; use threads::shared; use IO::Handle; use Thread::Semaphore; # Here we've a choice - use Storable to freeze and thaw stuff, or # these two bogus implementations which do nothing. This illustrates # (when you use the bogus code) how you cannot simply drop blessed # objects into shared objects. If you use Storable, blessed objects # can be passed across while they're frozen, but file handles (for # example) still can't. use Storable qw{freeze thaw}; #sub freeze { ${$_[0]} } #sub thaw { \shift } # Job queue and semaphore; the semaphore gets signalled every time an # object is pushed onto the queue. my @queue : shared; my $queue_ctrl = Thread::Semaphore->new(0); sub thread { while(1) { $queue_ctrl->down; # Wait for something to be queued lock @queue; last unless @queue; # Terminate on empty packet my $object = ${ thaw shift @queue }; # Would normally receive something relatively complex and do some # work with it here. print "Thread received $object.\n"; } } # Note that I'm creating the thread *first* and *afterwards* passing # objects across (through the queue) for it to work with.. my $thr = threads->new(\&thread); # Things we want to try passing through to the thread: my @stuff = ( # A basic type, works no problem with or without freeze/thaw '"hi there"', # A blessed reference, must be frozen and thawed bless({}, 'Foo'), # We can pass STDIN's file descriptor as an integer STDIN->fileno, # Have to comment this next line out for successful completion - but # this (or some equivalent) is what I would like to achieve! *STDIN, ); # Queue stuff and see what happens foreach my $obj ( @stuff ) { { # Queue $obj and signal $queue_ctrl lock @queue; print "\nEnqueueing $obj...\n"; push @queue, freeze \$obj; $queue_ctrl->up; } sleep 1; # Optional, but gives the thread a chance to grab the queue } # Terminate - signal $queue_ctrl without enqueueing anything. $queue_ctrl->up; $thr->join;

      The first thing to realise is that raw filehandles and sockets are process global. So you don't need, and indeed cannot, share them between threads in the threads::shared sense, as they are effectively already shared by all threads in the process.

      Note: I'm talking about the OS & C-runtime concept of filehandles and sockets, not anything in the IO::* group of modules. These are peculiar beasts in that they are at some level like ordinary perl objects, which makes them difficult, if not impossible to use safely across threads, but they do not behave entirely like ordinary objects.

      So, the problem is how to transfer an open handle between threads (which at a higher level seems like a bad design to me, but I'll get back to that), given that threads::shared won't let you share a GLOB nor even a 5.8.x lexical scalar that is currently being used as a GLOB-like entity.

      After a little kicking around, I found a way. I'm not yet sure that it is a good thing to do, but I'll show you how anyway and hope that if someone else out there knows why it should not be done, they'll speak up.

      The trick is to pass the fileno of the open GLOB to the thread and then use the special form of open open FH, "&=$fileno" or die $! to re-open the handle within the thread before using it.

      #! perl -slw use strict; use IO::File; use threads; use Thread::Queue; sub thread{ my $Q = shift; my $fno = $Q->dequeue; open FH, "<&=$fno" or die $!; print for <FH>; return; } my $Q = new Thread::Queue; my $thread = threads->create( \&thread, $Q ); my $io = IO::File->new( 'junk', 'r' ) or die $!; $Q->enqueue( fileno $io ); $thread->join; $io->close; __END__ P:\test>395373 This is junk and some more junk and yet more junk this is line 4 of junk

      However, this is not a complete solution. Currently, probably because of a bug I think, but possibly by design, if you actually read from the handle in the thread where you opened it, then pass the fileno to another thread, reopen it and attempt to read some more from it, it fails. No errors, no warnings. Just no input.

      I've read and re-read perlfunc:open and perlopentut:Re-Opening-Files-(dups) , and I believe that the "&=$fileno" syntax should allow this...but it doesn't. If you need to be able to do this, you'll need to take it up with p5p guys and get their wisdom.

      Now, getting back to design. Whilst moving objects around between threads by freezing and thawing them is possible, it feels awefully hooky to me. Basically, if your design is done correctly, there should be no need to create duplicates of objects in different threads.

      Doing so is opening yourself up to a world of greif.

      These duplicate objects will be uncoordinated. Once you freeze an object, it is exactly that; frozen. Any subsequent changes you make will not be reflected in the copy that you reconstruct elsewhere. If it was trivial, or even if it was possible with a reasonable degree of difficulty to coordinate and synchronise objects across threads, Perl would do that for you. The fact that the clever guys that got iThreads this far did not step up to the plate and do this already, almost certainly means that you should not be trying to do this either.

      The fact is that I appear to have done as much coding of(i)threads as anyone I am aware of, and I have yet to see a usefully iThreadable problem I couldn't (almost trivially) solve. And so far, I have never needed to share either objects (or IO handles) between threads.

      But don't take my word for it. My knowledge only extends as far as I have tried to go. It would be good to see someone else pushing the boundaries of what's possible.

      If you could describe the problem that you are trying to solve at the application level, I'd be most interested to take a look.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
        Hi again & thanks for the detailed reply..

        What I'm trying to do.. A service which will do "stuff" (it's a generic architecture to be customised to particular applications, so I really mean stuff) to bundles of streams - essentially, a client will tell it "here's a bunch of input streams, and a corresponding bunch of output streams, process 'em using some-mechanism-or-other as you pump data from the inputs to the outputs". Each group of inputs and outputs are collated in some way for purposes of error correction, and the service will process multiple bundles simultaneously. Little example diagram here, with two bundles: the first (A->B) is unidirectional with three inputs and only two outputs, while the second (C<->D) is bidirectional with one stream at one end and three at the other (the actual stuff that multiplexes/demultiplexes doesn't really matter for this question).

        A0--\ /B0 A1---+->stuff>--+ Bundle 0 A2--/ \B1 /--D0 C0---<stuff>--+---D1 Bundle 1 \--D2 ... Bundle N ...

        I'm using Perl 'cos it facilitates rapid prototyping (this is research work), it has good network handling, and is easy to integrate with CGI, web services, and third-party applications (all of which are desirable in this case).

        The architecture I've gone with, due to the relatively heavy weight of Perl's threads, is to set up a single worker thread which processes all of the bundles in a select() loop, and then have the main thread obtain, validate and submit control directives (such as adding new bundles or modifying existing ones). In an ideal world, I'd just turn each control directive into a closure and hand it off to the worker thread to deal with in its own good time. In another language that might be easy while all the other stuff I've mentioned might be hard; in Perl, the other stuff is easy and this, while not hard to hack (I'm already doing it by passing filenos across) seems hard to do nicely.

        The reason for separating the control invocation from the worker thread is that I envisage a variety of different control styles, e.g. web page, web service interface, and straight program API. I'd rather not get that all munged up with the I/O and processing work...

        So I escape your wise warning re: freezing objects because the control thread destroys its copies of them as soon as they're frozen and passed off to the worker. It's clearly not efficient, but my purpose for the time being is clarity and flexibility, which whole-object transmission gives me. Passing instructions through as unblessed shared string/list/hash structures was more error-prone.

        Now on to the file descriptor messaging: at the moment I do one of two things: either I pass across coordinates (such as "hostname:port") and the worker initiates a socket connection itself, or I pass across the file descriptor of a socket/pipe/filehandle and the worker tries to reconstruct the original object using IO::xxx->new_from_fd(). What would be nice would be to be able to just pass across IO::xxx objects; since that's not possible without extending Storable (I am tempted, would make everything tidy again), the necessary thing seems to be to deconstruct the objects into class (necessary since I can usefully half-close socket connections but not files for example) and file descriptor. Your open("&=") trick is analogous to the new_from_fd one, and difficult because it too needs to know the opening mode of the original object (you have to specify "<&=", ">>&=", etc.). So I guess my ultimate questions are:

        1. Is there a better (read: clean and object-oriented) way of telling another Perl thread about an IO::Handle than deconstructing it to its file descriptor, transmitting that, and then reconstructing it in the receiving thread? I think that the answer to this is no.
        2. If not, then given a file descriptor $fd, is there no better way of reconstructing it accurately into an object than examining the entrails of fcntl($fd, F_GETFL) (to obtain R/W/append flags) and passing my conclusions from that back to IO::xxx->new_from_fd? It seems faintly wasteful that new_from_fd (and fdopen) need this information when it seems to be embedded in the file descriptor already..
        If it was trivial, or even if it was possible with a reasonable degree of difficulty to coordinate and synchronise objects across threads, Perl would do that for you. The fact that the clever guys that got iThreads this far did not step up to the plate and do this already, almost certainly means that you should not be trying to do this either.

        Yes it's easy to synchronize objects (at least Storable ones) across threads. RFC677 (yes the one from IETF) tells you how.