Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Sockets: TMTOWTDI (BWWIB)?

by Ionizor (Pilgrim)
on Dec 16, 2002 at 01:32 UTC ( [id://220094]=perlquestion: print w/replies, xml ) Need Help??

Ionizor has asked for the wisdom of the Perl Monks concerning the following question:

A little bit of background first: I'm thinking of writing a MUD server (well, it's more like an extensible MUD engine, such as the Gecko engine is to web browsers). I'm thinking of writing it in Perl for a variety of reasons, then backporting parts of it to C if speed becomes an issue. Unfortunatly, right now I'm stuck at the very beginning.

I've looked over examples of sockets code, I read the IPC chapter in the Camel, looked at a few C primers on networking and sockets and did my research. As far as I can tell there are three main ways to deal with multiple concurrent connections and I don't know which one would be best for the job I'm trying to do.

From what I've seen, you can use fork() to split off a child for each open connection, use select() to switch between sockets, or use a threaded model.

Can anyone tell me what the advantages and disadvantages to each of these models would be, aside from the obvious (threads require threadsafe modules and so forth)? Why should I choose one model over another? What are the limitations of each model? What are the benefits? What are the common gotchas and pitfalls for implementing each of the models? Am I going to run into portability problems trying to move between Win32/Cygwin and Unix?

To sum up my problem, There's More Than One Way To Do It But Which Way Is Best?

Much obliged.

Replies are listed 'Best First'.
Re: Sockets: TMTOWTDI (BWWIB)?
by pfaut (Priest) on Dec 16, 2002 at 02:15 UTC

    First, I'd recommend a good TCP/IP book. I learned from the Comer/Stevens series. I would recommend you pick up one of the following: Internetworking with TCP/IP, Volume III, BSD version, Windows version, linux/POSIX version. http://www.bookpool.com is a good place to get technical books at a discount. Somebody else here might have some other book recommendations but I've always heard good reviews about this series.

    As far as handling connections goes, you have the following choice:

    • Accept a connection, read the request, generate and send the response, close the socket. This only works for a very lightly loaded server, small amounts of data, responses that can be generated quickly. This is the simplest way to put a server together.
    • Fork a new process for each connection. On unix-like systems, this works out well because it doesn't take much to fork. On some other systems, creating a new process by forking takes a lot of time or resources and is highly discouraged. Since it sounds like you're working on Windows, you probably don't want to go this way. This method works if each client can be handled individually. In your case, I think your server is going to be a multi-client controller so your clients need to pass information between them through your server. If you go the fork route, you'll also be messing with inter-process communication (IPC) on your server machine to do this. This takes a lot of coordination (locking, queuing, shared memory, etc.) You might run into system limits on the number of child processes or a limit on process per user.
    • select() or poll() would allow one process to handle multiple clients asynchronously. The select() or poll() call is told what handles to inspect and returns a list of handles that have data ready or are ready to accept data. This is a fairly efficient way of doing what you need. Some operating systems, I think Windows included, may have a limit on the number of network handles you can use with these calls. Also, some operating systems won't allow you to specify non-socket handles which could cause problems if you also need to synchronize on other things. Here you might run into limits on handle counts.
    • Create a new thread for each client. This method is similar to the multi-process model. You still need to figure out how to pass information between threads and how to synchronize but the solutions in a threaded model are simpler than in a multi-process model. You'll have the same handle count limits here as in the select() model.

    Which way is best is for you to decide. What works best on one platform may not work well at all on another. Try to separate your network interface from the rest of your code so that you can try implementing it in different ways.

    As a disclaimer, I've written many network servers but usually in C/C++. I've never tried to do one in perl.

    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';

      Thank you. I found this thread very helpful and informative. Unfortunately the book you've noted is a bit out of my price range and my Christmas gifts have already been bought.

      I've eliminated the "open, read, write, close" model because the overhead is too high. I want the server to be able to handle a load of a few hundred concurrent users, even if it doesn't ever scale that high in actuality.

      Given the problems with fork() in Windows I've eliminated it as well as I don't currently have a Unix machine to test on. This may change in future.

      This leaves me with threads and select(). I am still weighing the benefits and drawbacks but I'm leaning towards select() switching. I've taken your advice and the model I've drawn up for the server has the networking core abstracted with an API so I can switch the underlying implementation as it becomes prudent to do so.

      Is there a specific reason you've never written network servers in Perl? (i.e. am I missing something important?)

      Thank you once again for sharing your perspective.

        Is there a specific reason you've never written network servers in Perl? (i.e. am I missing something important?)

        You're not missing anything. Most of the network servers I've written have been done as part of my job. The situations called for coding in C/C++. Most of my personal stuff has been written to run as web-based applications and are written in perl.

        --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';
Re: Sockets: TMTOWTDI (BWWIB)?
by MarkM (Curate) on Dec 16, 2002 at 04:12 UTC

    The question of 'which server model should I use' is a very complicated one, and I doubt you will find a consistent answer from different people for 'which way is best?'

    For the most basic overview:

    1. One process per connection: Medium capacity. Simple to implement and maintain. On UNIX platforms, data can be shared between clients in a restricted manner using shared memory segments. Another alternative would be a database backend. Active connection limit restricted by maximum number of process that can be executed on system at one time. If one process dies due to an implementation error, other process remains active and unaffected.
    2. One thread per connection: Medium->Large capacity. Data is easily shared between clients. Active connection limited restricted by maximum number of threads that can be executed on system at one time. If Perl ITHREADS are used, special consideration should be taken to understand the cost of creating a thread (Perl data structure copying, etc.). Cost can be minimized by utilizing a thread pool, instead of creating a new thread for each connection.
    3. One process, switched connections:Medium->Large capacity if implemented correctly. Correct implementation is difficult as each connection must be managed by a state machine-like device, event handlers need to be broken into small pieces, and some events need to be prioritized. Data is easily shared between clients. Implementation errors have the greatest chance of affecting all active connections when compared to any of the other models.
    4. Multiple processes, connection loop:This is the model used by the Apache web server. The principle is that multiple processes actually wait on accept(), and the first one to succeed, gets the connection, and handles the connection completed. An outer processes ensures that sufficient processes are waiting on the listening socket. This model is very resistant to most implementation errors, and is very efficient. Where other models require multiple system calls to handle a connection (accept()/fork(), accept()/pthread_create(), select()/accept()), this model requires only one system call: accept(). Client data must be accessed as per the 'One process per connection' model.

    In terms of portability, all models should function to different degrees. My personal attachment is the 'One process, switched connections' model, however, due to its complexity, I would not recommend it to those personally unfamiliar with the issues involved. If you have not ever used non-blocking I/O, I recommend that the 'One process, switched connections' model not be considered.

    This description is far from complete, but it should provide you with rough expectations. Notice that I did not label any of these solutions 'Large capacity'. Any implementation that truly needs 'Large capacity' should not be written in Perl. By 'Large capacity', I mean 10000+ connections per minute sustained.

    WIN32 NOTE: Take into account that fork() is implemented using Perl ITHREADS under WIN32. Therefore, although all multiple process models should function under WIN32, it is probably better to consider implementing the solutions using Perl threads to avoid the 'emulation' layer.

      Regarding familiarity with the issues involved in the switched connections model, do you think the books that were recommended would be a sufficient teacher? At this point, the switched connections model is the most appealing to me.

      The server won't be handling 10000+ connections. There's a way to scare me to wakefulness. Heh.

        I am positive that the books mentioned will be useful. Many people have recommended them to me over the years.

        For myself, I found reading books or articles that describe the concepts to be useful, and a good head start, but I found the actual experience of determining the problems in my own implementations to be more valuable in the long run. If you have the time, try implementing both, and then optimizing each to the best of your ability. Subject your code to a peer review either here, or within another Perl community, or people who work with you. You won't lose from the experience.

        If you don't have the time, I would still recommend either one of the process models, or the thread models, over switched connections. Since you are going to be using WIN32, the thread model is probably best.

        It took three generations to get my current pure-perl event loop and socket management code to the level it is now. With standard Intel hardware of yesteryear (400-800Mhz, single CPU), it is able to handle 1000+ active connections in 2 seconds. These three generations represent several months of work (at least a few weeks of solid work mixed with odd complaints regarding production environment behaviour or misbehaviour). This is why I recommend against it. If you still want to pursue this path, you may cut some corners by using an existing event loop such as the one used by the Tk module.

        I apologize, but I am not able to release the code I speak of at the current point in time. It is owned by my employer, and all that... I am more than happy to comment on code that you submit, though.

Re: Sockets: TMTOWTDI (BWWIB)?
by pg (Canon) on Dec 16, 2002 at 02:01 UTC
    I would choose threads over process, unless threads is not available, which is not (no longer) the case in Perl. When both threads and process can provide concurrent programming, why threads over process? my two-penny answer:
    • Create process is expensive.
      1. It takes time. A call into OS is needed. If it triggers process rescheduling, OS context-switching will be involved.
      2. It takes memory. The entire process would be replicated.
      3. Interprocess communication and synchronization is also quite expansive.
    • Threads can be created without replicating entire process. Some, if not all, of the work of creating threads is done in user space rather than kernal space. Threads can be synchronized by just monitoring a variable -- in other words, in user address space of the program. When process synchronize, it usually involves expansive operations trapping into kernel.
    • Better performance.
      It is very OS dependant. For example on Linux creating a new process is not more expensive that creating threads (after all internally creating process and thread is almost same thing for Linux). Nor it takes more memory because of copy on write. Perfomance is not necessary better with threads because they have overhead or synchronization and locking which separate processes do not have.

      Add to this that some OS don't have good threads implementation yet (STABLE branch of FreeBSD comes to my mind) and Perl's support for them is still very experimental (even in 5.8.0).

      --
      Ilya Martynov, ilya@iponweb.net
      CTO IPonWEB (UK) Ltd
      Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
      Personal website - http://martynov.org

Re: Sockets: TMTOWTDI (BWWIB)?
by Nitrox (Chaplain) on Dec 16, 2002 at 03:23 UTC
    You may also want to take a peek at PerlMUD.

    -Nitrox

      I did actually find this before I posted here but I'm reluctant to look at his code because it doesn't say what terms he's licensing it under. I checked the PerlMUD site and the documentation that comes with the server. I don't want to be accused of plagiarism later.
(jeffa) Re: Sockets: TMTOWTDI (BWWIB)?
by jeffa (Bishop) on Dec 16, 2002 at 15:44 UTC

      Sweet.

      Coincidentally, I have a Safari subscription. It's an excellent service for anyone who doesn't know about it. Looks like that copy is closer than I thought!

      Thanks much.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://220094]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-28 16:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found