Re^4: Passing globs between threads

Hi again & thanks for the detailed reply..

What I'm trying to do.. A service which will do "stuff" (it's a generic architecture to be customised to particular applications, so I really mean stuff) to bundles of streams - essentially, a client will tell it "here's a bunch of input streams, and a corresponding bunch of output streams, process 'em using some-mechanism-or-other as you pump data from the inputs to the outputs". Each group of inputs and outputs are collated in some way for purposes of error correction, and the service will process multiple bundles simultaneously. Little example diagram here, with two bundles: the first (A->B) is unidirectional with three inputs and only two outputs, while the second (C<->D) is bidirectional with one stream at one end and three at the other (the actual stuff that multiplexes/demultiplexes doesn't really matter for this question).

A0--\            /B0
A1---+->stuff>--+         Bundle 0
A2--/            \B1

               /--D0
C0---<stuff>--+---D1      Bundle 1
               \--D2

                ...       Bundle N
                ...
[download]

I'm using Perl 'cos it facilitates rapid prototyping (this is research work), it has good network handling, and is easy to integrate with CGI, web services, and third-party applications (all of which are desirable in this case).

The architecture I've gone with, due to the relatively heavy weight of Perl's threads, is to set up a single worker thread which processes all of the bundles in a select() loop, and then have the main thread obtain, validate and submit control directives (such as adding new bundles or modifying existing ones). In an ideal world, I'd just turn each control directive into a closure and hand it off to the worker thread to deal with in its own good time. In another language that might be easy while all the other stuff I've mentioned might be hard; in Perl, the other stuff is easy and this, while not hard to hack (I'm already doing it by passing filenos across) seems hard to do nicely.

The reason for separating the control invocation from the worker thread is that I envisage a variety of different control styles, e.g. web page, web service interface, and straight program API. I'd rather not get that all munged up with the I/O and processing work...

So I escape your wise warning re: freezing objects because the control thread destroys its copies of them as soon as they're frozen and passed off to the worker. It's clearly not efficient, but my purpose for the time being is clarity and flexibility, which whole-object transmission gives me. Passing instructions through as unblessed shared string/list/hash structures was more error-prone.

Now on to the file descriptor messaging: at the moment I do one of two things: either I pass across coordinates (such as "hostname:port") and the worker initiates a socket connection itself, or I pass across the file descriptor of a socket/pipe/filehandle and the worker tries to reconstruct the original object using IO::xxx->new_from_fd(). What would be nice would be to be able to just pass across IO::xxx objects; since that's not possible without extending Storable (I am tempted, would make everything tidy again), the necessary thing seems to be to deconstruct the objects into class (necessary since I can usefully half-close socket connections but not files for example) and file descriptor. Your open("&=") trick is analogous to the new_from_fd one, and difficult because it too needs to know the opening mode of the original object (you have to specify "<&=", ">>&=", etc.). So I guess my ultimate questions are:

Is there a better (read: clean and object-oriented) way of telling another Perl thread about an IO::Handle than deconstructing it to its file descriptor, transmitting that, and then reconstructing it in the receiving thread? I think that the answer to this is no.
If not, then given a file descriptor $fd, is there no better way of reconstructing it accurately into an object than examining the entrails of fcntl($fd, F_GETFL) (to obtain R/W/append flags) and passing my conclusions from that back to IO::xxx->new_from_fd? It seems faintly wasteful that new_from_fd (and fdopen) need this information when it seems to be embedded in the file descriptor already..

Comment on Re^4: Passing globs between threads Select or Download Code

Replies are listed 'Best First'.
Re^5: Passing globs between threads by BrowserUk (Patriarch) on Oct 02, 2004 at 06:17 UTC
So I escape your wise warning re: freezing objects because the control thread destroys its copies of them as soon as they're frozen and passed off to the worker. In effect, you are not sharing an object between threads, you are actually just passing an object template that you want to be construct by the receiving thread. The only benefit you are gaining from doing it this way is the encapsulation of the objects class into the queued (Storable) string. Though I guess it does allow you to build the object instance by calling multiple methods prior to freezing it, which means that the object is validated before you pass it. I think that passing a hash or an array containing the parameters to be used in the constructor in the destination thread--along with some convention of passing the class of the object as a named parameter in the hash or as the first element of the array would be just as clean, and probably somewhat more efficient. For one thing it would allow you to avoid loading the code for every class of object into the main thread as well as every receiving thread. Then again, I guess it does simplify the re-construction at the receiving end. Storable does all the work for you. And I haven't benchmarked it, and so as long as your not trying to use the objects concurrently from multiple threads, it should be fine. On the filehandle stuff, using a filemode of 'r+' and/or '+< &=$fileno', appears to let you read and write to the file from either thread, but it's not quite right. I'm convinced that the semantics of using the "&=$fileno" open is screwed up somehow--at least when combined with threads--but I'm not sure that I understand what the semantics should be in non-threaded code, so it's difficult to tell. It could just be another "not quite POSIX behaviour" win32 thing? Sorry, but I can't be much help there. It might be worth taking up the problem with as a Storable/IO::* limitation with the p5p guys. Their greater understanding may see the reason/cause for it, and they may be able to suggest something? Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply]

Replies are listed 'Best First'.

Re^5: Passing globs between threads
by BrowserUk (Patriarch) on Oct 02, 2004 at 06:17 UTC

So I escape your wise warning re: freezing objects because the control thread destroys its copies of them as soon as they're frozen and passed off to the worker.

In effect, you are not sharing an object between threads, you are actually just passing an object template that you want to be construct by the receiving thread. The only benefit you are gaining from doing it this way is the encapsulation of the objects class into the queued (Storable) string. Though I guess it does allow you to build the object instance by calling multiple methods prior to freezing it, which means that the object is validated before you pass it.

I think that passing a hash or an array containing the parameters to be used in the constructor in the destination thread--along with some convention of passing the class of the object as a named parameter in the hash or as the first element of the array would be just as clean, and probably somewhat more efficient. For one thing it would allow you to avoid loading the code for every class of object into the main thread as well as every receiving thread.

Then again, I guess it does simplify the re-construction at the receiving end. Storable does all the work for you. And I haven't benchmarked it, and so as long as your not trying to use the objects concurrently from multiple threads, it should be fine.

On the filehandle stuff, using a filemode of 'r+' and/or '+< &=$fileno', appears to let you read and write to the file from either thread, but it's not quite right. I'm convinced that the semantics of using the "&=$fileno" open is screwed up somehow--at least when combined with threads--but I'm not sure that I understand what the semantics should be in non-threaded code, so it's difficult to tell. It could just be another "not quite POSIX behaviour" win32 thing? Sorry, but I can't be much help there.

It might be worth taking up the problem with as a Storable/IO::* limitation with the p5p guys. Their greater understanding may see the reason/cause for it, and they may be able to suggest something?

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

[reply]