deprecated has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Fellow Monks. Im writing an application to do something quite simple. Lets say there is a large network of servers that are all disconnected, or are grouped together in smaller subnets. They are all reachable from one another, but choose not to connect (irc for example). On all of these servers, there are various channels (also like irc). On almost all of these servers there is a #linux. Since I got tired of finding my linux-using friends, I devised a simple script to simply connect to all the servers, join the #linux channel, and mirror it to the server I am on.

Before anyone tells me thats an annoying thing to do, lets just tackle the code part of it. There are, at the moment, 201 of these servers. Lets say it takes only two seconds to connect to these servers. Thats still just shy of seven minutes to connect to all of them.

Lincoln Stein's Napster.pm is doing the actual connecting here, and the module requires a threading perl, which I've got. I shouldn't have to wait for a response from each server after I send my login request, some of them will be slower than others. I'd like to read down the list (keys %client_servers_host_info), and just send a login request to all of them, and then deal with them as they get back to me, or, if necessary, destroy the connection. Instead, it takes 10 minutes to get to only 60 of the servers because I am waiting for every one of them. So my question is, then, is there some way I can do this like I would in bash (i.e., dothis & dothis & dothis)? Furthermore, since the module itself already makes use of threads, is it possible to thread a threaded sub?

So here is the code. $host_server_config{foo} is the relevant data for the server I am mirroring all the other servers to. It's a one-way mirror. %client_servers_host_info is basically just a long hash that looks like "server port". "dacts" is just a sub that interfaces with the connect method in the module. Sorry if the variable names are a little confusing.

foreach my $client_server (keys %client_servers_host_info) { srand; my $extra = int(rand(256)); warn $host_server_config{username} . $extra . "\n"; $client_servers{$client_server} = dacts ( $host_server_config{username} . $extra, $host_server_config{password}, $client_server, $client_servers_host_info{$client_server} ); $counter++ ; warn $counter; }

Of course there is one other question here. I have gotten lots of people telling me how stupid it is to be using perl-threads at all. I know the threading implementations changed between 5.005 and 5.6.0, I know that in the 5.6 version there are two different flavors even of threads. People on dalnet #perl think its idiotic for me to use threads, but I am using threads because of the module. It seems to work very well for me when I have less than 400 threads running (yes, perl segfaults for me up higher than that). I just read the interview with Larry in Opensource Developers Journal, and it sounds like perl 6 is going to be a drastic departure. How can I write more portable, more stable, and long-lasting code that appears and behaves threadedly in perl?

I know its kind of a dense post. Thanks guys, gals.

deprecated

--
i am not cool enough to have a signature.

  • Comment on Simultaneous writes and as-needed reads from sockets (or The State of Perl Threads...)
  • Download Code

Replies are listed 'Best First'.
Re: Simultaneous writes and as-needed reads from sockets (or The State of Perl Threads...)
by Fastolfe (Vicar) on Jan 13, 2001 at 01:15 UTC
    This sounds like a perfect job for IO::Select. Unfortunately I've yet to find a comprehensive tutorial for managing several simultaneous non-blocking IO::Socket connections, but I've written stuff to do this in the past. So I can't really recommend any reading for you except some standard texts on writing network code using non-blocking sockets.

    Essentially, what I would do, is loop through the number of servers you have, create a new IO::Socket object, make it non-blocking, and dispatch a 'connect' request for it. Since it's non-blocking, you won't know if the connect succeeded or not until later. Repeat this for all of your sockets, and then enter a select loop (via IO::Select).

    Keep data you're planning on sending in a buffer for that socket, select for writing those sockets that have data waiting to go out, write that data, and drop from the buffer the amount of data that was written.

    Select for reading all of your sockets, process incoming data (being careful to preserve partial lines for next time), send it to whatever socket's input buffer it should go, etc.

    Perhaps someone else can provide links to a good tutorial on building something like this.

    In addition, migrating to an event-based architecture (such as POE) might be useful as well. I suspect a lot of this is "built-in".

      Hiya Fastolfe...

      I checked out POE (when you said architecture I got to thinking "hardware", actually its a perl module available from CPAN for anyone curious), and Im going to give it a lookover this evening.

      What you said about IO::Select is something everyone else has said to me. However, IO::Select is what the module is using. I actually am not doing any of the socketting. I wanted to be able to just launch a lot of processes and not deal with them until they had something to say. Like, for instance, in bash if i were do this:

      $ cat /etc/services & cat /etc/sendmail.cf

      is going to get me lines to the terminal from both files, roughly intermingled. I dont see why I shouldnt be able to do this from within perl, as what I am actually doing is pretty nonintense. If this clarifies things and youre able to suggest something else, by all means do. Otherwise I'm going to have a look over POE and see how relevant it is and whether I can actually grok it.

      Thanks again,
      deprecated.

      --
      i am not cool enough to have a signature.

        and just send a login request to all of them, and then deal with them as they get back to me, or, if necessary, destroy the connection. Instead, it takes 10 minutes to get to only 60 of the servers because I am waiting for every one of them.

        This doesn't sound like select-based behavior. Perhaps you're mixing select with standard blocking calls? A good non-blocking select-based implementation of something like this should be able to handle dozens of network sockets simultaneously without a significant amount of delay. If you're hitting multiple servers, you should be able to number your simultaneous connections in the hundreds and under any decent system, your bottleneck will be with your network connection, not the app (unless you're doing a lot of processing with the inbound data I guess).

        I mean, there are two ways you can go about this. I have no idea how Napster.pm does its thing. If you say it's working with select, fine. I don't get what the purpose of multiple threads is, in that case, but whatever. It's not important. With select, you work with Perl filehandles. These can be network sockets, files, STDIN/STDOUT, pipes, whatever. If you want to use open($S, "-|") to fork off a child process, IO::Select would be happy to use $S as a filehandle to watch. You can repeat this a dozen times to get the behavior you're looking for, with each child doing an exec or whatever it is you want. The select call will be happy to tell you which file handles have data waiting to be read.

        So basically, going back to your problem, it sounds like you neither want nor can code any direct hooks into the way this other module is doing its network handling. So realistically you can't make it "go faster" when it's working with multiple servers. What, then, do you plan to do? Do you want to fork off your process into 40 sub-processes, each one devoted to a single server? If so, select is still very much an option. Instead of selecting against network sockets, select against a pipes (such as that perlipc version of open above), and process inbound data from each of your children in turn.

        If something like this is the route you want to take, I highly recommend reading perlipc. You can always just fork and let each child write to STDOUT, but it's difficult to *capture* that information in a controlled way without using true pipes and mediating between them by using select, so that you can avoid blocking while waiting for data from one of them.

        If I've missed the boat on this, if your plan to break these tasks up is altogether more bizarre than anything I've mentioned, by all means let me know.

Re: Simultaneous writes and as-needed reads from sockets (or The State of Perl Threads...)
by repson (Chaplain) on Jan 13, 2001 at 11:48 UTC
    While reading the MP3::Napster documentation I thought that hacking the module to enable the Tk mode without needing to provide a Tk object would allow you to use non-blocking connects, which seem to be what you might want. I don't know how easy it would be, or if your perl skills are up to it, but it should be possible and may be the best solution.