rapier1 has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a server that will copy data from a network interface to multiple sockets. Normally the right way to do this would be to fork a seperate process for each connection request. Unfortunately, only one process can read from the network interface at a time. If I do fork, simultaneous reads will be attempted and I'll dump core. Not a good solution.

Will threads help with this? I was thinking that I can have one thread that actually does the reads from the interface and another that handles the writes to the sockets. However, I understand that there are significant issues with threads and threading support isn't even compiled into perl by default.

Should I explore using threads? Will I need to recompile perl to make use of the Thread module? Should I just continue to use what I have now (a non blocking, non forking server which might not scale well)? One thing to keep in mind is that this is running on an SMP machine. Will that complicate things?

Replies are listed 'Best First'.
Re: Threads, Forks, or Nothing?
by Aighearach (Initiate) on Aug 15, 2001 at 23:31 UTC

    My opinion on threads is, if you're not sure exactly why you're choosing them, don't.

    What I use for networking interfaces that need to be powerful is IO::Select and Tie::RefHash. Here is an example:

    #!/usr/bin/perl use strict; use warnings; use IO::Socket; use IO::Select; use Tie::RefHash; my ( %connections ); tie %connections, qw( Tie::RefHash ); # so we can use refs as hash key +s my $server = IO::Socket::INET->new( Listen => SOMAXCONN, LocalPort => $opts{port}, Reuse => 1, Proto => 'tcp', ) or die "can't open connection: $!"; $server->blocking( 0 ); my $select = IO::Select->new( $server ); while('eternity'){ foreach my $con ( $select->can_read(1) ) { #another way is to let it block, instead of passing can_read a tim +eout, but I usually need to check state, maybe send new output, so I +need to loop often if ( $con == $server ) { # looks like we have a new user connection $client->blocking( 0 ); $select->add( $client ); #whatever you want to store about the client can go in the + value, I like using a hashref so I'm not limited $connections{$client}{ip} = $client->peerhost; } elsif ( exists $connections{$con} ) { # I guess it's a connected user sending us data ... } else { #whoops, unknown filehandlw } }

    An excellent example of Tie::RefHash and IO::Select can be found in the Perl IRC Daemon.
    --
    Snazzy tagline here

Re: Threads, Forks, or Nothing?
by blakem (Monsignor) on Aug 15, 2001 at 23:22 UTC
    Do you have the Perl Cookbook? (if not, get one through the monestary) It has a whole chapter (#17) devoted to writing network servers using various multitasking methods. I found it very helpful when I needed to write one.

    If you have the Cookbook lying around you should definitely read that chapter. Otherwise I'd highly recommend buying it, since it is a very good resource.

    -Blake

There's always user space threads
by mugwumpjism (Hermit) on Aug 16, 2001 at 02:31 UTC

    There are three ways you can achieve multiprocessing:

    1. Heavyweight processes, using pipes or other IPC, as mentioned above. Awkward.
    2. Lightweight processes, using Perl threading, hoping that everything you use is re-entrant and thread safe. Dangerous.
    3. User space processes, where code surrenders control of the processor to other pieces of code once it has had a "fair turn". POE is an excellent framework for achieving this without getting too much of a headache.

    You don't need kernel threads to write multithreaded applications. You just need to break your problem up in to chunks, build a little "run-queue" of waiting chunks of code, make sure all your system calls are non-blocking, and have a loop that picks the first chunk off the queue and runs it. In many cases, this type of threading may actually outperform kernel enforced lightweight processes. Sure, you might still hang somewhere if you've a bug, but that's always the case, isn't it?

Re: Threads, Forks, or Nothing?
by clintp (Curate) on Aug 15, 2001 at 23:26 UTC
    I'm writing a server that will copy data from a network interface to multiple sockets. Normally the right way to do this would be to fork a seperate process for each connection request. Unfortunately, only one process can read from the network interface at a time. If I do fork, simultaneous reads will be attempted and I'll dump core. Not a good solution.
    Pardon? Why not have one reader and multiple writers; the reader and the writers speak to each other over another channel to pass the actual message forward (using message queues, sockets, shared memory, semaphores, filesystems, etc...)?

    As far as fork/thread ... I'm partial to forking. But I'm a Unix snob.

      One reader and multiple writers is exactly what I am trying to do. The idea with threads was to have one thread as the reader and another thread to control the writers using a shared memory space. What you suggested is essentially what I am doing now...
      in pseudo code while (1) if accept (non blocking) initialize writer add Ptr to writer hash if %writer read_cell (atm cells) foreach connection %writer if lifetime of writer exceeded undef $writer{connection} next write cell and then back through the loop.
      The problem is that initializing a writer takes a non zero amount of time. During this period we can't read any cells off the interface and at OC3 and OC12 speeds (which is what we are dealing with) it can be a significant bit of data lost. So the ideal solution would be to off load the writer initialization and handling to another thread so that existing writers wouldn't have the data flow interrupted.

      The goal is to have a server that can gracefully handle gigabit speeds and more than 5 writers at a time.

      Also, as I stated, I do not think forking can work. I can only have one reader on an interface at a time. If there is some way to only fork a subroutine I'm all ears (eyes, whatever).

        The trick there is to have your children ready before you accept. Basically you fork multiple times, then the parent sits on the socket. The moment you get a connection you just start writing to the children. In this case you would probably just have the children blocking on reads from the pipe to the parent.

        The Apache webserver does this to improve it's response times to connections. If you look at the process table while apache is running, you will see MAX_SERVERS + 1 processes. That's the parent with MAX_SERVERS children standing by.

        ____________________
        Jeremy
        I didn't believe in evil until I dated it.

Re: Threads, Forks, or Nothing?
by nakor (Novice) on Aug 16, 2001 at 01:12 UTC
    Simply put, don't use threads. If you are at all hesitant about using them, there is a depressingly large chance that you will subtly screw up somewhere and cause Perl to coredump at inoppertune moments. (Murphy is watching, after all!) What I would probably do is actually have two servers: one that reads from the interface and writes to a single pipe or socket, and one (fork()ing) server that reads from that pipe/socket and multiplexes it out to all the consumers. That way, you get safe "multithreading" and no worries about races or competition for the interface.